ttc4900

参考文献:

ttc4900

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:ttc4900/ttc4900')
  • 説明
The data set is taken from kemik group
http://www.kemik.yildiz.edu.tr/
The data are pre-processed for the text categorization, collocations are found, character set is corrected, and so forth.
We named TTC4900 by mimicking the name convention of TTC 3600 dataset shared by the study http://journals.sagepub.com/doi/abs/10.1177/0165551515620551

If you use the dataset in a paper, please refer https://www.kaggle.com/savasy/ttc4900 as footnote and cite one of the papers as follows:

- A Comparison of Different Approaches to Document Representation in Turkish Language, SDU Journal of Natural and Applied Science, Vol 22, Issue 2, 2018
- A comparative analysis of text classification for Turkish language, Pamukkale University Journal of Engineering Science Volume 25 Issue 5, 2018
- A Knowledge-poor Approach to Turkish Text Categorization with a Comparative Analysis, Proceedings of CICLING 2014, Springer LNCS, Nepal, 2014.
  • ライセンス: CC0: パブリックドメイン
  • バージョン: 1.0.0
  • 分割:
スプリット
'train' 4900
  • 特徴
{
    "category": {
        "num_classes": 7,
        "names": [
            "siyaset",
            "dunya",
            "ekonomi",
            "kultur",
            "saglik",
            "spor",
            "teknoloji"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}