multi_nli

תיאור :

הקורפוס של Multi-Genre Natural Language Inference (MultiNLI) הוא אוסף של 433,000 צמדי משפטים במקור המונים עם הערות עם מידע טקסטואלי. הקורפוס מעוצב על פי הקורפוס SNLI, אך שונה בכך שהוא מכסה מגוון ז'אנרים של טקסט מדובר וכתוב, ותומך בהערכת הכללה חוצת ז'אנרים ייחודית. הקורפוס שימש בסיס למשימה המשותפת של סדנת RepEval 2017 ב-EMNLP בקופנהגן.

תיעוד נוסף : חקור על ניירות עם קוד
דף הבית : https://www.nyu.edu/projects/bowman/multinli/
קוד מקור : tfds.text.MultiNLI
גרסאות :
- 1.1.0 (ברירת מחדל): אין הערות שחרור.
גודל הורדה : 216.34 MiB
גודל ערכת נתונים: 89.50 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	392,702
`'validation_matched'`	9,815
`'validation_mismatched'`	9,832

מבנה תכונה :

FeaturesDict({
    'hypothesis': Text(shape=(), dtype=string),
    'label': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'premise': Text(shape=(), dtype=string),
})

תיעוד תכונה :

תכונה	מעמד	Dtype
	FeaturesDict
הַשׁעָרָה	טֶקסט	חוּט
תווית	ClassLabel	int64
הַנָחַת יְסוֹד	טֶקסט	חוּט

מפתחות בפיקוח (ראה as_supervised doc ): None
איור ( tfds.show_examples ): לא נתמך.
דוגמאות ( tfds.as_dataframe ):

ציטוט :

@InProceedings{N18-1101,
  author = "Williams, Adina
            and Nangia, Nikita
            and Bowman, Samuel",
  title = "A Broad-Coverage Challenge Corpus for
           Sentence Understanding through Inference",
  booktitle = "Proceedings of the 2018 Conference of
               the North American Chapter of the
               Association for Computational Linguistics:
               Human Language Technologies, Volume 1 (Long
               Papers)",
  year = "2018",
  publisher = "Association for Computational Linguistics",
  pages = "1112--1122",
  location = "New Orleans, Louisiana",
  url = "http://aclweb.org/anthology/N18-1101"
}