참고자료:
너
TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.
ds = tfds.load('huggingface:id_nergrit_corpus/ner')
- 설명 :
Nergrit Corpus is a dataset collection for Indonesian Named Entity Recognition, Statement Extraction, and Sentiment
Analysis. id_nergrit_corpus is the Named Entity Recognition of this dataset collection which contains 18 entities as
follow:
'CRD': Cardinal
'DAT': Date
'EVT': Event
'FAC': Facility
'GPE': Geopolitical Entity
'LAW': Law Entity (such as Undang-Undang)
'LOC': Location
'MON': Money
'NOR': Political Organization
'ORD': Ordinal
'ORG': Organization
'PER': Person
'PRC': Percent
'PRD': Product
'QTY': Quantity
'REG': Religion
'TIM': Time
'WOA': Work of Art
'LAN': Language
- 라이센스 : 알려진 라이센스 없음
- 버전 : 1.1.0
- 분할 :
나뉘다 | 예 |
---|---|
'test' | 2399 |
'train' | 12532 |
'validation' | 2521 |
- 특징 :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"tokens": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"ner_tags": {
"feature": {
"num_classes": 39,
"names": [
"B-CRD",
"B-DAT",
"B-EVT",
"B-FAC",
"B-GPE",
"B-LAN",
"B-LAW",
"B-LOC",
"B-MON",
"B-NOR",
"B-ORD",
"B-ORG",
"B-PER",
"B-PRC",
"B-PRD",
"B-QTY",
"B-REG",
"B-TIM",
"B-WOA",
"I-CRD",
"I-DAT",
"I-EVT",
"I-FAC",
"I-GPE",
"I-LAN",
"I-LAW",
"I-LOC",
"I-MON",
"I-NOR",
"I-ORD",
"I-ORG",
"I-PER",
"I-PRC",
"I-PRD",
"I-QTY",
"I-REG",
"I-TIM",
"I-WOA",
"O"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}
감정
TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.
ds = tfds.load('huggingface:id_nergrit_corpus/sentiment')
- 설명 :
Nergrit Corpus is a dataset collection for Indonesian Named Entity Recognition, Statement Extraction, and Sentiment
Analysis. id_nergrit_corpus is the Named Entity Recognition of this dataset collection which contains 18 entities as
follow:
'CRD': Cardinal
'DAT': Date
'EVT': Event
'FAC': Facility
'GPE': Geopolitical Entity
'LAW': Law Entity (such as Undang-Undang)
'LOC': Location
'MON': Money
'NOR': Political Organization
'ORD': Ordinal
'ORG': Organization
'PER': Person
'PRC': Percent
'PRD': Product
'QTY': Quantity
'REG': Religion
'TIM': Time
'WOA': Work of Art
'LAN': Language
- 라이센스 : 알려진 라이센스 없음
- 버전 : 1.1.0
- 분할 :
나뉘다 | 예 |
---|---|
'test' | 2317 |
'train' | 7485 |
'validation' | 782 |
- 특징 :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"tokens": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"ner_tags": {
"feature": {
"num_classes": 7,
"names": [
"B-NEG",
"B-NET",
"B-POS",
"I-NEG",
"I-NET",
"I-POS",
"O"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}
성명
TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.
ds = tfds.load('huggingface:id_nergrit_corpus/statement')
- 설명 :
Nergrit Corpus is a dataset collection for Indonesian Named Entity Recognition, Statement Extraction, and Sentiment
Analysis. id_nergrit_corpus is the Named Entity Recognition of this dataset collection which contains 18 entities as
follow:
'CRD': Cardinal
'DAT': Date
'EVT': Event
'FAC': Facility
'GPE': Geopolitical Entity
'LAW': Law Entity (such as Undang-Undang)
'LOC': Location
'MON': Money
'NOR': Political Organization
'ORD': Ordinal
'ORG': Organization
'PER': Person
'PRC': Percent
'PRD': Product
'QTY': Quantity
'REG': Religion
'TIM': Time
'WOA': Work of Art
'LAN': Language
- 라이센스 : 알려진 라이센스 없음
- 버전 : 1.1.0
- 분할 :
나뉘다 | 예 |
---|---|
'test' | 335 |
'train' | 2405 |
'validation' | 176 |
- 특징 :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"tokens": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"ner_tags": {
"feature": {
"num_classes": 9,
"names": [
"B-BREL",
"B-FREL",
"B-STAT",
"B-WHO",
"I-BREL",
"I-FREL",
"I-STAT",
"I-WHO",
"O"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}