参考文献:
粗い粒度
次のコマンドを使用して、このデータセットを TFDS にロードします。
ds = tfds.load('huggingface:roman_urdu_hate_speech/Coarse_Grained')
- 説明:
The Roman Urdu Hate-Speech and Offensive Language Detection (RUHSOLD) dataset is a Roman Urdu dataset of tweets annotated by experts in the relevant language. The authors develop the gold-standard for two sub-tasks. First sub-task is based on binary labels of Hate-Offensive content and Normal content (i.e., inoffensive language). These labels are self-explanatory. The authors refer to this sub-task as coarse-grained classification. Second sub-task defines Hate-Offensive content with four labels at a granular level. These labels are the most relevant for the demographic of users who converse in RU and are defined in related literature. The authors refer to this sub-task as fine-grained classification. The objective behind creating two gold-standards is to enable the researchers to evaluate the hate speech detection approaches on both easier (coarse-grained) and challenging (fine-grained) scenarios.
- ライセンス: MITライセンス
- バージョン: 1.1.0
- 分割:
スプリット | 例 |
---|---|
'test' | 2002年 |
'train' | 7208 |
'validation' | 800 |
- 特徴:
{
"tweet": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"label": {
"num_classes": 2,
"names": [
"Abusive/Offensive",
"Normal"
],
"id": null,
"_type": "ClassLabel"
}
}
きめ細かい
次のコマンドを使用して、このデータセットを TFDS にロードします。
ds = tfds.load('huggingface:roman_urdu_hate_speech/Fine_Grained')
- 説明:
The Roman Urdu Hate-Speech and Offensive Language Detection (RUHSOLD) dataset is a Roman Urdu dataset of tweets annotated by experts in the relevant language. The authors develop the gold-standard for two sub-tasks. First sub-task is based on binary labels of Hate-Offensive content and Normal content (i.e., inoffensive language). These labels are self-explanatory. The authors refer to this sub-task as coarse-grained classification. Second sub-task defines Hate-Offensive content with four labels at a granular level. These labels are the most relevant for the demographic of users who converse in RU and are defined in related literature. The authors refer to this sub-task as fine-grained classification. The objective behind creating two gold-standards is to enable the researchers to evaluate the hate speech detection approaches on both easier (coarse-grained) and challenging (fine-grained) scenarios.
- ライセンス: MITライセンス
- バージョン: 1.1.0
- 分割:
スプリット | 例 |
---|---|
'test' | 2002年 |
'train' | 7208 |
'validation' | 7208 |
- 特徴:
{
"tweet": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"label": {
"num_classes": 5,
"names": [
"Abusive/Offensive",
"Normal",
"Religious Hate",
"Sexism",
"Profane/Untargeted"
],
"id": null,
"_type": "ClassLabel"
}
}