อ้างอิง:
หยาบ_เนื้อละเอียด
ใช้คำสั่งต่อไปนี้เพื่อโหลดชุดข้อมูลนี้ใน TFDS:
ds = tfds.load('huggingface:roman_urdu_hate_speech/Coarse_Grained')
- คำอธิบาย :
The Roman Urdu Hate-Speech and Offensive Language Detection (RUHSOLD) dataset is a Roman Urdu dataset of tweets annotated by experts in the relevant language. The authors develop the gold-standard for two sub-tasks. First sub-task is based on binary labels of Hate-Offensive content and Normal content (i.e., inoffensive language). These labels are self-explanatory. The authors refer to this sub-task as coarse-grained classification. Second sub-task defines Hate-Offensive content with four labels at a granular level. These labels are the most relevant for the demographic of users who converse in RU and are defined in related literature. The authors refer to this sub-task as fine-grained classification. The objective behind creating two gold-standards is to enable the researchers to evaluate the hate speech detection approaches on both easier (coarse-grained) and challenging (fine-grained) scenarios.
- ใบอนุญาต : ใบอนุญาตเอ็มไอที
- เวอร์ชั่น : 1.1.0
- แยก :
แยก | ตัวอย่าง |
---|---|
'test' | 2545 |
'train' | 7208 |
'validation' | 800 |
- คุณสมบัติ :
{
"tweet": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"label": {
"num_classes": 2,
"names": [
"Abusive/Offensive",
"Normal"
],
"id": null,
"_type": "ClassLabel"
}
}
ละเอียด_เป็นเนื้อละเอียด
ใช้คำสั่งต่อไปนี้เพื่อโหลดชุดข้อมูลนี้ใน TFDS:
ds = tfds.load('huggingface:roman_urdu_hate_speech/Fine_Grained')
- คำอธิบาย :
The Roman Urdu Hate-Speech and Offensive Language Detection (RUHSOLD) dataset is a Roman Urdu dataset of tweets annotated by experts in the relevant language. The authors develop the gold-standard for two sub-tasks. First sub-task is based on binary labels of Hate-Offensive content and Normal content (i.e., inoffensive language). These labels are self-explanatory. The authors refer to this sub-task as coarse-grained classification. Second sub-task defines Hate-Offensive content with four labels at a granular level. These labels are the most relevant for the demographic of users who converse in RU and are defined in related literature. The authors refer to this sub-task as fine-grained classification. The objective behind creating two gold-standards is to enable the researchers to evaluate the hate speech detection approaches on both easier (coarse-grained) and challenging (fine-grained) scenarios.
- ใบอนุญาต : ใบอนุญาตเอ็มไอที
- เวอร์ชั่น : 1.1.0
- แยก :
แยก | ตัวอย่าง |
---|---|
'test' | 2545 |
'train' | 7208 |
'validation' | 7208 |
- คุณสมบัติ :
{
"tweet": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"label": {
"num_classes": 5,
"names": [
"Abusive/Offensive",
"Normal",
"Religious Hate",
"Sexism",
"Profane/Untargeted"
],
"id": null,
"_type": "ClassLabel"
}
}