参考:
germeval_14
使用以下命令在 TFDS 中加载此数据集:
ds = tfds.load('huggingface:germeval_14/germeval_14')
- 说明:
The GermEval 2014 NER Shared Task builds on a new dataset with German Named Entity annotation with the following properties: - The data was sampled from German Wikipedia and News Corpora as a collection of citations. - The dataset covers over 31,000 sentences corresponding to over 590,000 tokens. - The NER annotation uses the NoSta-D guidelines, which extend the Tübingen Treebank guidelines, using four main NER categories with sub-structure, and annotating embeddings among NEs such as [ORG FC Kickers [LOC Darmstadt]].
- 许可:无已知许可
- 版本:2.0.0
- 拆分:
拆分 | 样本 |
---|---|
'test' |
5100 |
'train' |
24000 |
'validation' |
2200 |
- 特征:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"source": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"tokens": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"ner_tags": {
"feature": {
"num_classes": 25,
"names": [
"O",
"B-LOC",
"I-LOC",
"B-LOCderiv",
"I-LOCderiv",
"B-LOCpart",
"I-LOCpart",
"B-ORG",
"I-ORG",
"B-ORGderiv",
"I-ORGderiv",
"B-ORGpart",
"I-ORGpart",
"B-OTH",
"I-OTH",
"B-OTHderiv",
"I-OTHderiv",
"B-OTHpart",
"I-OTHpart",
"B-PER",
"I-PER",
"B-PERderiv",
"I-PERderiv",
"B-PERpart",
"I-PERpart"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"nested_ner_tags": {
"feature": {
"num_classes": 25,
"names": [
"O",
"B-LOC",
"I-LOC",
"B-LOCderiv",
"I-LOCderiv",
"B-LOCpart",
"I-LOCpart",
"B-ORG",
"I-ORG",
"B-ORGderiv",
"I-ORGderiv",
"B-ORGpart",
"I-ORGpart",
"B-OTH",
"I-OTH",
"B-OTHderiv",
"I-OTHderiv",
"B-OTHpart",
"I-OTHpart",
"B-PER",
"I-PER",
"B-PERderiv",
"I-PERderiv",
"B-PERpart",
"I-PERpart"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}