TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

id_nergrit_corpus

参考：

ner

使用以下命令在 TFDS 中加载此数据集：

ds = tfds.load('huggingface:id_nergrit_corpus/ner')

说明：

Nergrit Corpus is a dataset collection for Indonesian Named Entity Recognition, Statement Extraction, and Sentiment
Analysis. id_nergrit_corpus is the Named Entity Recognition of this dataset collection which contains 18 entities as
follow:
    'CRD': Cardinal
    'DAT': Date
    'EVT': Event
    'FAC': Facility
    'GPE': Geopolitical Entity
    'LAW': Law Entity (such as Undang-Undang)
    'LOC': Location
    'MON': Money
    'NOR': Political Organization
    'ORD': Ordinal
    'ORG': Organization
    'PER': Person
    'PRC': Percent
    'PRD': Product
    'QTY': Quantity
    'REG': Religion
    'TIM': Time
    'WOA': Work of Art
    'LAN': Language

许可：无已知许可
版本：1.0.0
拆分：

拆分	样本
`'test'`	2399
`'train'`	12532
`'validation'`	2521

特征：

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "tokens": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner_tags": {
        "feature": {
            "num_classes": 39,
            "names": [
                "B-CRD",
                "B-DAT",
                "B-EVT",
                "B-FAC",
                "B-GPE",
                "B-LAN",
                "B-LAW",
                "B-LOC",
                "B-MON",
                "B-NOR",
                "B-ORD",
                "B-ORG",
                "B-PER",
                "B-PRC",
                "B-PRD",
                "B-QTY",
                "B-REG",
                "B-TIM",
                "B-WOA",
                "I-CRD",
                "I-DAT",
                "I-EVT",
                "I-FAC",
                "I-GPE",
                "I-LAN",
                "I-LAW",
                "I-LOC",
                "I-MON",
                "I-NOR",
                "I-ORD",
                "I-ORG",
                "I-PER",
                "I-PRC",
                "I-PRD",
                "I-QTY",
                "I-REG",
                "I-TIM",
                "I-WOA",
                "O"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

sentiment

使用以下命令在 TFDS 中加载此数据集：

ds = tfds.load('huggingface:id_nergrit_corpus/sentiment')

说明：

Nergrit Corpus is a dataset collection for Indonesian Named Entity Recognition, Statement Extraction, and Sentiment
Analysis. id_nergrit_corpus is the Named Entity Recognition of this dataset collection which contains 18 entities as
follow:
    'CRD': Cardinal
    'DAT': Date
    'EVT': Event
    'FAC': Facility
    'GPE': Geopolitical Entity
    'LAW': Law Entity (such as Undang-Undang)
    'LOC': Location
    'MON': Money
    'NOR': Political Organization
    'ORD': Ordinal
    'ORG': Organization
    'PER': Person
    'PRC': Percent
    'PRD': Product
    'QTY': Quantity
    'REG': Religion
    'TIM': Time
    'WOA': Work of Art
    'LAN': Language

许可：无已知许可
版本：1.0.0
拆分：

拆分	样本
`'test'`	2317
`'train'`	7485
`'validation'`	782

特征：

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "tokens": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner_tags": {
        "feature": {
            "num_classes": 7,
            "names": [
                "B-NEG",
                "B-NET",
                "B-POS",
                "I-NEG",
                "I-NET",
                "I-POS",
                "O"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

statement

使用以下命令在 TFDS 中加载此数据集：

ds = tfds.load('huggingface:id_nergrit_corpus/statement')

说明：

Nergrit Corpus is a dataset collection for Indonesian Named Entity Recognition, Statement Extraction, and Sentiment
Analysis. id_nergrit_corpus is the Named Entity Recognition of this dataset collection which contains 18 entities as
follow:
    'CRD': Cardinal
    'DAT': Date
    'EVT': Event
    'FAC': Facility
    'GPE': Geopolitical Entity
    'LAW': Law Entity (such as Undang-Undang)
    'LOC': Location
    'MON': Money
    'NOR': Political Organization
    'ORD': Ordinal
    'ORG': Organization
    'PER': Person
    'PRC': Percent
    'PRD': Product
    'QTY': Quantity
    'REG': Religion
    'TIM': Time
    'WOA': Work of Art
    'LAN': Language

许可：无已知许可
版本：1.0.0
拆分：

拆分	样本
`'test'`	335
`'train'`	2405
`'validation'`	176

特征：

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "tokens": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "ner_tags": {
        "feature": {
            "num_classes": 9,
            "names": [
                "B-BREL",
                "B-FREL",
                "B-STAT",
                "B-WHO",
                "I-BREL",
                "I-FREL",
                "I-STAT",
                "I-WHO",
                "O"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}