• Description:

The shared task of CoNLL-2003 concerns language-independent named entity recognition and concentrates on four types of named entities: persons, locations, organizations and names of miscellaneous entities that do not belong to the previous three groups.

Split Examples
'dev' 3,251
'test' 3,454
'train' 14,042
  • Feature structure:
    'chunks': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=23)),
    'ner': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=9)),
    'pos': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=47)),
    'tokens': Sequence(Text(shape=(), dtype=string)),
  • Feature documentation:
Feature Class Shape Dtype Description
chunks Sequence(ClassLabel) (None,) int64
ner Sequence(ClassLabel) (None,) int64
pos Sequence(ClassLabel) (None,) int64
tokens Sequence(Text) (None,) string
  • Citation:
    title = "Introduction to the {C}o{NLL}-2003 Shared Task: Language-Independent Named Entity Recognition",
    author = "Tjong Kim Sang, Erik F.  and
      De Meulder, Fien",
    booktitle = "Proceedings of the Seventh Conference on Natural Language Learning at {HLT}-{NAACL} 2003",
    year = "2003",
    url = "",
    pages = "142--147",

conll2003/conll2003 (default config)