conll2002

参考:

es

使用以下命令在 TFDS 中加载此数据集:

ds = tfds.load('huggingface:conll2002/es')
  • 说明
Named entities are phrases that contain the names of persons, organizations, locations, times and quantities.

Example:
[PER Wolff] , currently a journalist in [LOC Argentina] , played with [PER Del Bosque] in the final years of the seventies in [ORG Real Madrid] .

The shared task of CoNLL-2002 concerns language-independent named entity recognition.
We will concentrate on four types of named entities: persons, locations, organizations and names of miscellaneous entities that do not belong to the previous three groups.
The participants of the shared task will be offered training and test data for at least two languages.
They will use the data for developing a named-entity recognition system that includes a machine learning component.
Information sources other than the training data may be used in this shared task.
We are especially interested in methods that can use additional unannotated data for improving their performance (for example co-training).

The train/validation/test sets are available in Spanish and Dutch.

For more details see https://www.clips.uantwerpen.be/conll2002/ner/ and https://www.aclweb.org/anthology/W02-2024/
  • 许可:无已知许可
  • 版本:1.0.0
  • 拆分
拆分 样本
'test' 1518
'train' 8324
'validation' 1916
  • 特征
{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "tokens": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "pos_tags": {
        "feature": {
            "num_classes": 60,
            "names": [
                "AO",
                "AQ",
                "CC",
                "CS",
                "DA",
                "DE",
                "DD",
                "DI",
                "DN",
                "DP",
                "DT",
                "Faa",
                "Fat",
                "Fc",
                "Fd",
                "Fe",
                "Fg",
                "Fh",
                "Fia",
                "Fit",
                "Fp",
                "Fpa",
                "Fpt",
                "Fs",
                "Ft",
                "Fx",
                "Fz",
                "I",
                "NC",
                "NP",
                "P0",
                "PD",
                "PI",
                "PN",
                "PP",
                "PR",
                "PT",
                "PX",
                "RG",
                "RN",
                "SP",
                "VAI",
                "VAM",
                "VAN",
                "VAP",
                "VAS",
                "VMG",
                "VMI",
                "VMM",
                "VMN",
                "VMP",
                "VMS",
                "VSG",
                "VSI",
                "VSM",
                "VSN",
                "VSP",
                "VSS",
                "Y",
                "Z"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner_tags": {
        "feature": {
            "num_classes": 9,
            "names": [
                "O",
                "B-PER",
                "I-PER",
                "B-ORG",
                "I-ORG",
                "B-LOC",
                "I-LOC",
                "B-MISC",
                "I-MISC"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

nl

使用以下命令在 TFDS 中加载此数据集:

ds = tfds.load('huggingface:conll2002/nl')
  • 说明
Named entities are phrases that contain the names of persons, organizations, locations, times and quantities.

Example:
[PER Wolff] , currently a journalist in [LOC Argentina] , played with [PER Del Bosque] in the final years of the seventies in [ORG Real Madrid] .

The shared task of CoNLL-2002 concerns language-independent named entity recognition.
We will concentrate on four types of named entities: persons, locations, organizations and names of miscellaneous entities that do not belong to the previous three groups.
The participants of the shared task will be offered training and test data for at least two languages.
They will use the data for developing a named-entity recognition system that includes a machine learning component.
Information sources other than the training data may be used in this shared task.
We are especially interested in methods that can use additional unannotated data for improving their performance (for example co-training).

The train/validation/test sets are available in Spanish and Dutch.

For more details see https://www.clips.uantwerpen.be/conll2002/ner/ and https://www.aclweb.org/anthology/W02-2024/
  • 许可:无已知许可
  • 版本:1.0.0
  • 拆分
拆分 样本
'test' 5196
'train' 15807
'validation' 2896
  • 特征
{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "tokens": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "pos_tags": {
        "feature": {
            "num_classes": 12,
            "names": [
                "Adj",
                "Adv",
                "Art",
                "Conj",
                "Int",
                "Misc",
                "N",
                "Num",
                "Prep",
                "Pron",
                "Punc",
                "V"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner_tags": {
        "feature": {
            "num_classes": 9,
            "names": [
                "O",
                "B-PER",
                "I-PER",
                "B-ORG",
                "I-ORG",
                "B-LOC",
                "I-LOC",
                "B-MISC",
                "I-MISC"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}