تي تي سي 4900

مراجع:

ttc4900

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:ttc4900/ttc4900')

وصف :

The data set is taken from kemik group
http://www.kemik.yildiz.edu.tr/
The data are pre-processed for the text categorization, collocations are found, character set is corrected, and so forth.
We named TTC4900 by mimicking the name convention of TTC 3600 dataset shared by the study http://journals.sagepub.com/doi/abs/10.1177/0165551515620551

If you use the dataset in a paper, please refer https://www.kaggle.com/savasy/ttc4900 as footnote and cite one of the papers as follows:

- A Comparison of Different Approaches to Document Representation in Turkish Language, SDU Journal of Natural and Applied Science, Vol 22, Issue 2, 2018
- A comparative analysis of text classification for Turkish language, Pamukkale University Journal of Engineering Science Volume 25 Issue 5, 2018
- A Knowledge-poor Approach to Turkish Text Categorization with a Comparative Analysis, Proceedings of CICLING 2014, Springer LNCS, Nepal, 2014.

الترخيص : CC0: المجال العام
الإصدار : 1.0.0
الإنشقاقات :

ينقسم	أمثلة
`'train'`	4900

سمات :

{
    "category": {
        "num_classes": 7,
        "names": [
            "siyaset",
            "dunya",
            "ekonomi",
            "kultur",
            "saglik",
            "spor",
            "teknoloji"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}