conll2002
Stay organized with collections
Save and categorize content based on your preferences.
The shared task of CoNLL-2002 concerns language-independent named entity
recognition. The types of named entities include: persons, locations,
organizations and names of miscellaneous entities that do not belong to the
previous three groups. The participants of the shared task were offered training
and test data for at least two languages. Information sources other than the
training data might have been used in this shared task.
@inproceedings{tjong-kim-sang-2002-introduction,
title = "Introduction to the {C}o{NLL}-2002 Shared Task: Language-Independent Named Entity Recognition",
author = "Tjong Kim Sang, Erik F.",
booktitle = "{COLING}-02: The 6th Conference on Natural Language Learning 2002 ({C}o{NLL}-2002)",
year = "2002",
url = "https://aclanthology.org/W02-2024",
}
conll2002/es (default config)
Download size: 3.95 MiB
Dataset size: 3.52 MiB
Splits:
Split |
Examples |
'dev' |
1,916 |
'test' |
1,518 |
'train' |
8,324 |
FeaturesDict({
'ner': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=9)),
'pos': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=60)),
'tokens': Sequence(Text(shape=(), dtype=string)),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
ner |
Sequence(ClassLabel) |
(None,) |
int64 |
|
pos |
Sequence(ClassLabel) |
(None,) |
int64 |
|
tokens |
Sequence(Text) |
(None,) |
string |
|
conll2002/nl
Download size: 3.47 MiB
Dataset size: 3.55 MiB
Splits:
Split |
Examples |
'dev' |
2,896 |
'test' |
5,196 |
'train' |
15,807 |
FeaturesDict({
'ner': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=9)),
'pos': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=12)),
'tokens': Sequence(Text(shape=(), dtype=string)),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
ner |
Sequence(ClassLabel) |
(None,) |
int64 |
|
pos |
Sequence(ClassLabel) |
(None,) |
int64 |
|
tokens |
Sequence(Text) |
(None,) |
string |
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-12-22 UTC.
[null,null,["Last updated 2022-12-22 UTC."],[],[],null,["# conll2002\n\n\u003cbr /\u003e\n\n- **Description**:\n\nThe shared task of CoNLL-2002 concerns language-independent named entity\nrecognition. The types of named entities include: persons, locations,\norganizations and names of miscellaneous entities that do not belong to the\nprevious three groups. The participants of the shared task were offered training\nand test data for at least two languages. Information sources other than the\ntraining data might have been used in this shared task.\n\n- **Homepage** :\n \u003chttps://aclanthology.org/W02-2024/\u003e\n\n- **Source code** :\n [`tfds.text.conll2002.Conll2002`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/text/conll2002/conll2002.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): Initial release.\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Citation**:\n\n @inproceedings{tjong-kim-sang-2002-introduction,\n title = \"Introduction to the {C}o{NLL}-2002 Shared Task: Language-Independent Named Entity Recognition\",\n author = \"Tjong Kim Sang, Erik F.\",\n booktitle = \"{COLING}-02: The 6th Conference on Natural Language Learning 2002 ({C}o{NLL}-2002)\",\n year = \"2002\",\n url = \"https://aclanthology.org/W02-2024\",\n }\n\nconll2002/es (default config)\n-----------------------------\n\n- **Download size** : `3.95 MiB`\n\n- **Dataset size** : `3.52 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'dev'` | 1,916 |\n| `'test'` | 1,518 |\n| `'train'` | 8,324 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'ner': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=9)),\n 'pos': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=60)),\n 'tokens': Sequence(Text(shape=(), dtype=string)),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|---------|----------------------|---------|--------|-------------|\n| | FeaturesDict | | | |\n| ner | Sequence(ClassLabel) | (None,) | int64 | |\n| pos | Sequence(ClassLabel) | (None,) | int64 | |\n| tokens | Sequence(Text) | (None,) | string | |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nconll2002/nl\n------------\n\n- **Download size** : `3.47 MiB`\n\n- **Dataset size** : `3.55 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'dev'` | 2,896 |\n| `'test'` | 5,196 |\n| `'train'` | 15,807 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'ner': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=9)),\n 'pos': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=12)),\n 'tokens': Sequence(Text(shape=(), dtype=string)),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|---------|----------------------|---------|--------|-------------|\n| | FeaturesDict | | | |\n| ner | Sequence(ClassLabel) | (None,) | int64 | |\n| pos | Sequence(ClassLabel) | (None,) | int64 | |\n| tokens | Sequence(Text) | (None,) | string | |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples..."]]