TFDS תומך כעת בפורמט קרואסון 🥐 ! קרא את התיעוד כדי לדעת יותר.

דף זה תורגם על ידי Cloud Translation API.

unified_qa

תיאור :

אמת המידה של UnifiedQA מורכבת מ-20 מערכי נתונים של תשובות לשאלות עיקריות (לכל אחת עשויות להיות גרסאות מרובות) המכוונות לפורמטים שונים כמו גם לתופעות לשוניות מורכבות שונות. מערכי נתונים אלה מקובצים למספר פורמטים/קטגוריות, כולל: QA מיצוי, QA אבסטרקטי, QA רב-ברירה וכן QA כן/לא. בנוסף, ערכות ניגודיות משמשות עבור מספר מערכי נתונים (מסומנים ב"ערכות ניגודיות "). ערכות הערכה אלו הן הפרעות שנוצרו על ידי מומחים החורגות מהדפוסים הנפוצים במערך הנתונים המקורי. עבור מספר מערכי נתונים שאינם מגיעים עם פסקאות ראיות, כלולות שתי גרסאות: אחת שבה מערכי הנתונים משמשים כפי שהם ואחרת המשתמשת בפסקאות שנשלפו באמצעות מערכת אחזור מידע כראיות נוספות, המסומנות בתגיות "_ir".

מידע נוסף ניתן למצוא בכתובת: https://github.com/allenai/unifiedqa

דף הבית : https://github.com/allenai/unifiedqa
קוד מקור : tfds.text.unifiedqa.UnifiedQA
גרסאות :
- 1.0.0 (ברירת מחדל): שחרור ראשוני.
מבנה תכונה :

FeaturesDict({
    'input': string,
    'output': string,
})

תיעוד תכונה :

תכונה	מעמד	Dtype
	FeaturesDict
קֶלֶט	מוֹתֵחַ	חוּט
תְפוּקָה	מוֹתֵחַ	חוּט

מפתחות בפיקוח (ראה as_supervised doc ): None
איור ( tfds.show_examples ): לא נתמך.

unified_qa/ai2_science_elementary (תצורת ברירת מחדל)

תיאור תצורה : מערך השאלות המדע של AI2 מורכב משאלות המשמשות בהערכות תלמידים בארצות הברית על פני רמות כיתות בית ספר יסודי וחטיבת ביניים. כל שאלה היא פורמט רב-ברירה בעל 4 כיוונים ועשויה לכלול אלמנט דיאגרמה או לא. ערכה זו מורכבת משאלות המשמשות לכיתות בית ספר יסודי.
גודל הורדה : 345.59 KiB
גודל מערך נתונים : 390.02 KiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	542
`'train'`	623
`'validation'`	123

דוגמאות ( tfds.as_dataframe ):

ציטוט :

http://data.allenai.org/ai2-science-questions

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/ai2_science_middle

תיאור תצורה : מערך השאלות המדע של AI2 מורכב משאלות המשמשות בהערכות תלמידים בארצות הברית על פני רמות כיתות בית ספר יסודי וחטיבת ביניים. כל שאלה היא פורמט רב-ברירה בעל 4 כיוונים ועשויה לכלול אלמנט דיאגרמה או לא. ערכה זו מורכבת משאלות המשמשות לכיתות חטיבת הביניים.
גודל הורדה : 428.41 KiB
גודל מערך נתונים : 477.40 KiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	679
`'train'`	605
`'validation'`	125

דוגמאות ( tfds.as_dataframe ):

ציטוט :

http://data.allenai.org/ai2-science-questions

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/ambigqa

תיאור תצורה : AmbigQA היא משימת מענה לשאלות בדומיין פתוח הכוללת מציאת כל תשובה סבירה, ולאחר מכן שכתוב השאלה עבור כל אחת מהן כדי לפתור את האי בהירות.
גודל הורדה : 2.27 MiB
גודל מערך נתונים : 3.04 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	19,806
`'validation'`	5,674

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{min-etal-2020-ambigqa,
    title = "{A}mbig{QA}: Answering Ambiguous Open-domain Questions",
    author = "Min, Sewon  and
      Michael, Julian  and
      Hajishirzi, Hannaneh  and
      Zettlemoyer, Luke",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.emnlp-main.466",
    doi = "10.18653/v1/2020.emnlp-main.466",
    pages = "5783--5797",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/arc_easy

תיאור תצורה : מערך נתונים זה מורכב משאלות מדעיות אמיתיות ברמת בית הספר היסודי, רב-ברירה, שנאספו כדי לעודד מחקר במענה מתקדם על שאלות. מערך הנתונים מחולק ל-Challenge Set ו-Easy Set, כאשר הראשון מכיל רק שאלות שתשובות לא נכונות הן על ידי אלגוריתם מבוסס שליפה והן אלגוריתם של הופעת מילים. סט זה מורכב משאלות "קלות".
גודל הורדה : 1.24 MiB
גודל מערך נתונים : 1.42 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	2,376
`'train'`	2,251
`'validation'`	570

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@article{clark2018think,
    title={Think you have solved question answering? try arc, the ai2 reasoning challenge},
    author={Clark, Peter and Cowhey, Isaac and Etzioni, Oren and Khot, Tushar and Sabharwal, Ashish and Schoenick, Carissa and Tafjord, Oyvind},
    journal={arXiv preprint arXiv:1803.05457},
    year={2018}
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/arc_easy_dev

תיאור תצורה : מערך נתונים זה מורכב משאלות מדעיות אמיתיות ברמת בית הספר היסודי, רב-ברירה, שנאספו כדי לעודד מחקר במענה מתקדם על שאלות. מערך הנתונים מחולק ל-Challenge Set ו-Easy Set, כאשר הראשון מכיל רק שאלות שתשובות לא נכונות הן על ידי אלגוריתם מבוסס שליפה והן אלגוריתם של הופעת מילים. סט זה מורכב משאלות "קלות".
גודל הורדה : 1.24 MiB
גודל מערך נתונים : 1.42 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	2,376
`'train'`	2,251
`'validation'`	570

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@article{clark2018think,
    title={Think you have solved question answering? try arc, the ai2 reasoning challenge},
    author={Clark, Peter and Cowhey, Isaac and Etzioni, Oren and Khot, Tushar and Sabharwal, Ashish and Schoenick, Carissa and Tafjord, Oyvind},
    journal={arXiv preprint arXiv:1803.05457},
    year={2018}
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/arc_easy_with_ir

תיאור תצורה : מערך נתונים זה מורכב משאלות מדעיות אמיתיות ברמת בית הספר היסודי, רב-ברירה, שנאספו כדי לעודד מחקר במענה מתקדם על שאלות. מערך הנתונים מחולק ל-Challenge Set ו-Easy Set, כאשר הראשון מכיל רק שאלות שתשובות לא נכונות הן על ידי אלגוריתם מבוסס שליפה והן אלגוריתם של הופעת מילים. סט זה מורכב משאלות "קלות". גרסה זו כוללת פסקאות שנשלפו באמצעות מערכת אחזור מידע כראיה נוספת.
גודל הורדה : 7.00 MiB
גודל מערך נתונים : 7.17 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	2,376
`'train'`	2,251
`'validation'`	570

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@article{clark2018think,
    title={Think you have solved question answering? try arc, the ai2 reasoning challenge},
    author={Clark, Peter and Cowhey, Isaac and Etzioni, Oren and Khot, Tushar and Sabharwal, Ashish and Schoenick, Carissa and Tafjord, Oyvind},
    journal={arXiv preprint arXiv:1803.05457},
    year={2018}
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/arc_easy_with_ir_dev

תיאור תצורה : מערך נתונים זה מורכב משאלות מדעיות אמיתיות ברמת בית הספר היסודי, רב-ברירה, שנאספו כדי לעודד מחקר במענה מתקדם על שאלות. מערך הנתונים מחולק ל-Challenge Set ו-Easy Set, כאשר הראשון מכיל רק שאלות שתשובות לא נכונות הן על ידי אלגוריתם מבוסס שליפה והן אלגוריתם של הופעת מילים. סט זה מורכב משאלות "קלות". גרסה זו כוללת פסקאות שנשלפו באמצעות מערכת אחזור מידע כראיה נוספת.
גודל הורדה : 7.00 MiB
גודל מערך נתונים : 7.17 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	2,376
`'train'`	2,251
`'validation'`	570

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@article{clark2018think,
    title={Think you have solved question answering? try arc, the ai2 reasoning challenge},
    author={Clark, Peter and Cowhey, Isaac and Etzioni, Oren and Khot, Tushar and Sabharwal, Ashish and Schoenick, Carissa and Tafjord, Oyvind},
    journal={arXiv preprint arXiv:1803.05457},
    year={2018}
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/arc_hard

תיאור תצורה : מערך נתונים זה מורכב משאלות מדעיות אמיתיות ברמת בית הספר היסודי, רב-ברירה, שנאספו כדי לעודד מחקר במענה מתקדם על שאלות. מערך הנתונים מחולק ל-Challenge Set ו-Easy Set, כאשר הראשון מכיל רק שאלות שתשובות לא נכונות הן על ידי אלגוריתם מבוסס שליפה והן אלגוריתם של הופעת מילים. סט זה מורכב משאלות "קשות".
גודל הורדה : 758.03 KiB
גודל מערך נתונים : 848.28 KiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	1,172
`'train'`	1,119
`'validation'`	299

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@article{clark2018think,
    title={Think you have solved question answering? try arc, the ai2 reasoning challenge},
    author={Clark, Peter and Cowhey, Isaac and Etzioni, Oren and Khot, Tushar and Sabharwal, Ashish and Schoenick, Carissa and Tafjord, Oyvind},
    journal={arXiv preprint arXiv:1803.05457},
    year={2018}
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/arc_hard_dev

תיאור תצורה : מערך נתונים זה מורכב משאלות מדעיות אמיתיות ברמת בית הספר היסודי, רב-ברירה, שנאספו כדי לעודד מחקר במענה מתקדם על שאלות. מערך הנתונים מחולק ל-Challenge Set ו-Easy Set, כאשר הראשון מכיל רק שאלות שתשובות לא נכונות הן על ידי אלגוריתם מבוסס שליפה והן אלגוריתם של הופעת מילים. סט זה מורכב משאלות "קשות".
גודל הורדה : 758.03 KiB
גודל מערך נתונים : 848.28 KiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	1,172
`'train'`	1,119
`'validation'`	299

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@article{clark2018think,
    title={Think you have solved question answering? try arc, the ai2 reasoning challenge},
    author={Clark, Peter and Cowhey, Isaac and Etzioni, Oren and Khot, Tushar and Sabharwal, Ashish and Schoenick, Carissa and Tafjord, Oyvind},
    journal={arXiv preprint arXiv:1803.05457},
    year={2018}
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/arc_hard_with_ir

תיאור תצורה : מערך נתונים זה מורכב משאלות מדעיות אמיתיות ברמת בית הספר היסודי, רב-ברירה, שנאספו כדי לעודד מחקר במענה מתקדם על שאלות. מערך הנתונים מחולק ל-Challenge Set ו-Easy Set, כאשר הראשון מכיל רק שאלות שתשובות לא נכונות הן על ידי אלגוריתם מבוסס שליפה והן אלגוריתם של הופעת מילים. סט זה מורכב משאלות "קשות". גרסה זו כוללת פסקאות שנשלפו באמצעות מערכת אחזור מידע כראיה נוספת.
גודל הורדה : 3.53 MiB
גודל מערך נתונים : 3.62 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	1,172
`'train'`	1,119
`'validation'`	299

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@article{clark2018think,
    title={Think you have solved question answering? try arc, the ai2 reasoning challenge},
    author={Clark, Peter and Cowhey, Isaac and Etzioni, Oren and Khot, Tushar and Sabharwal, Ashish and Schoenick, Carissa and Tafjord, Oyvind},
    journal={arXiv preprint arXiv:1803.05457},
    year={2018}
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/arc_hard_with_ir_dev

תיאור תצורה : מערך נתונים זה מורכב משאלות מדעיות אמיתיות ברמת בית הספר היסודי, רב-ברירה, שנאספו כדי לעודד מחקר במענה מתקדם על שאלות. מערך הנתונים מחולק ל-Challenge Set ו-Easy Set, כאשר הראשון מכיל רק שאלות שתשובות לא נכונות הן על ידי אלגוריתם מבוסס שליפה והן אלגוריתם של הופעת מילים. סט זה מורכב משאלות "קשות". גרסה זו כוללת פסקאות שנשלפו באמצעות מערכת אחזור מידע כראיה נוספת.
גודל הורדה : 3.53 MiB
גודל מערך נתונים : 3.62 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	1,172
`'train'`	1,119
`'validation'`	299

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@article{clark2018think,
    title={Think you have solved question answering? try arc, the ai2 reasoning challenge},
    author={Clark, Peter and Cowhey, Isaac and Etzioni, Oren and Khot, Tushar and Sabharwal, Ashish and Schoenick, Carissa and Tafjord, Oyvind},
    journal={arXiv preprint arXiv:1803.05457},
    year={2018}
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/boolq

תיאור תצורה : BoolQ הוא מערך תשובות לשאלות עבור שאלות כן/לא. שאלות אלו מתרחשות באופן טבעי --- הן נוצרות בהגדרות לא מתבקשות ובלתי מוגבלות. כל דוגמה היא שלישייה של (שאלה, קטע, תשובה), עם כותרת העמוד כהקשר נוסף אופציונלי. מערך הסיווג של צמד הטקסט דומה למשימות הסקת הסקת שפה טבעית קיימות.
גודל הורדה : 7.77 MiB
גודל מערך נתונים : 8.20 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	9,427
`'validation'`	3,270

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{clark-etal-2019-boolq,
    title = "{B}ool{Q}: Exploring the Surprising Difficulty of Natural Yes/No Questions",
    author = "Clark, Christopher  and
      Lee, Kenton  and
      Chang, Ming-Wei  and
      Kwiatkowski, Tom  and
      Collins, Michael  and
      Toutanova, Kristina",
    booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
    month = jun,
    year = "2019",
    address = "Minneapolis, Minnesota",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/N19-1300",
    doi = "10.18653/v1/N19-1300",
    pages = "2924--2936",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/boolq_np

תיאור תצורה : BoolQ הוא מערך תשובות לשאלות עבור שאלות כן/לא. שאלות אלו מתרחשות באופן טבעי --- הן נוצרות בהגדרות לא מתבקשות ובלתי מוגבלות. כל דוגמה היא שלישייה של (שאלה, קטע, תשובה), עם כותרת העמוד כהקשר נוסף אופציונלי. מערך הסיווג של צמד הטקסט דומה למשימות הסקת הסקת שפה טבעית קיימות. גרסה זו מוסיפה הפרעות טבעיות לגרסה המקורית.
גודל הורדה : 10.80 MiB
גודל ערכת נתונים : 11.40 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	9,727
`'validation'`	7,596

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{khashabi-etal-2020-bang,
    title = "More Bang for Your Buck: Natural Perturbation for Robust Question Answering",
    author = "Khashabi, Daniel  and
      Khot, Tushar  and
      Sabharwal, Ashish",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.emnlp-main.12",
    doi = "10.18653/v1/2020.emnlp-main.12",
    pages = "163--170",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/commonsenseqa

תיאור תצורה : CommonsenseQA הוא מערך תשובות לשאלות מרובות-ברירות חדש שדורש סוגים שונים של ידע בריא כדי לחזות את התשובות הנכונות. הוא מכיל שאלות עם תשובה אחת נכונה וארבע תשובות מסיחות דעת.
גודל הורדה : 1.79 MiB
גודל ערכת נתונים : 2.19 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	1,140
`'train'`	9,741
`'validation'`	1,221

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{talmor-etal-2019-commonsenseqa,
    title = "{C}ommonsense{QA}: A Question Answering Challenge Targeting Commonsense Knowledge",
    author = "Talmor, Alon  and
      Herzig, Jonathan  and
      Lourie, Nicholas  and
      Berant, Jonathan",
    booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
    month = jun,
    year = "2019",
    address = "Minneapolis, Minnesota",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/N19-1421",
    doi = "10.18653/v1/N19-1421",
    pages = "4149--4158",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/commonsenseqa_test

תיאור תצורה : CommonsenseQA הוא מערך תשובות לשאלות מרובות-ברירות חדש שדורש סוגים שונים של ידע בריא כדי לחזות את התשובות הנכונות. הוא מכיל שאלות עם תשובה אחת נכונה וארבע תשובות מסיחות דעת.
גודל הורדה : 1.79 MiB
גודל ערכת נתונים : 2.19 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	1,140
`'train'`	9,741
`'validation'`	1,221

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{talmor-etal-2019-commonsenseqa,
    title = "{C}ommonsense{QA}: A Question Answering Challenge Targeting Commonsense Knowledge",
    author = "Talmor, Alon  and
      Herzig, Jonathan  and
      Lourie, Nicholas  and
      Berant, Jonathan",
    booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
    month = jun,
    year = "2019",
    address = "Minneapolis, Minnesota",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/N19-1421",
    doi = "10.18653/v1/N19-1421",
    pages = "4149--4158",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/contrast_sets_boolq

תיאור תצורה : BoolQ הוא מערך תשובות לשאלות עבור שאלות כן/לא. שאלות אלו מתרחשות באופן טבעי --- הן נוצרות בהגדרות לא מתבקשות ובלתי מוגבלות. כל דוגמה היא שלישייה של (שאלה, קטע, תשובה), עם כותרת העמוד כהקשר נוסף אופציונלי. מערך הסיווג של צמד הטקסט דומה למשימות הסקת הסקת שפה טבעית קיימות. גרסה זו משתמשת בערכות ניגודיות. ערכות הערכה אלו הן הפרעות שנוצרו על ידי מומחים החורגות מהדפוסים הנפוצים במערך הנתונים המקורי.
גודל הורדה : 438.51 KiB
גודל מערך נתונים : 462.35 KiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	340
`'validation'`	340

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{clark-etal-2019-boolq,
    title = "{B}ool{Q}: Exploring the Surprising Difficulty of Natural Yes/No Questions",
    author = "Clark, Christopher  and
      Lee, Kenton  and
      Chang, Ming-Wei  and
      Kwiatkowski, Tom  and
      Collins, Michael  and
      Toutanova, Kristina",
    booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
    month = jun,
    year = "2019",
    address = "Minneapolis, Minnesota",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/N19-1300",
    doi = "10.18653/v1/N19-1300",
    pages = "2924--2936",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/contrast_sets_drop

תיאור תצורה : DROP הוא מדד QA במיקור המונים, שנוצר באופן יריב, שבו מערכת חייבת לפתור הפניות בשאלה, אולי למספר עמדות קלט, ולבצע עליהן פעולות בדידות (כגון הוספה, ספירה או מיון). פעולות אלה דורשות הבנה מקיפה הרבה יותר של התוכן של פסקאות ממה שהיה נחוץ עבור מערכי נתונים קודמים. גרסה זו משתמשת בערכות ניגודיות. ערכות הערכה אלו הן הפרעות שנוצרו על ידי מומחים החורגות מהדפוסים הנפוצים במערך הנתונים המקורי.
גודל הורדה : 2.20 MiB
גודל מערך נתונים : 2.26 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	947
`'validation'`	947

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{dua-etal-2019-drop,
    title = "{DROP}: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs",
    author = "Dua, Dheeru  and
      Wang, Yizhong  and
      Dasigi, Pradeep  and
      Stanovsky, Gabriel  and
      Singh, Sameer  and
      Gardner, Matt",
    booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
    month = jun,
    year = "2019",
    address = "Minneapolis, Minnesota",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/N19-1246",
    doi = "10.18653/v1/N19-1246",
    pages = "2368--2378",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/contrast_sets_quoref

תיאור תצורה : מערך נתונים זה בודק את יכולת החשיבה הבסיסית של מערכות הבנת הנקרא. במבחן טווח בחירה זה המכיל שאלות על פסקאות מויקיפדיה, מערכת חייבת לפתור את ההפניות הקשות לפני בחירת הטווח המתאים בפסקאות למענה על שאלות. גרסה זו משתמשת בערכות ניגודיות. ערכות הערכה אלו הן הפרעות שנוצרו על ידי מומחים החורגות מהדפוסים הנפוצים במערך הנתונים המקורי.
גודל הורדה : 2.60 MiB
גודל מערך נתונים : 2.65 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	700
`'validation'`	700

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{dasigi-etal-2019-quoref,
    title = "{Q}uoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning",
    author = "Dasigi, Pradeep  and
      Liu, Nelson F.  and
      Marasovi{'c}, Ana  and
      Smith, Noah A.  and
      Gardner, Matt",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D19-1606",
    doi = "10.18653/v1/D19-1606",
    pages = "5925--5932",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/contrast_sets_ropes

תיאור תצורה : מערך נתונים זה בודק את יכולתה של מערכת ליישם ידע ממעבר טקסט למצב חדש. מערכת מוצגת קטע רקע המכיל קשר(ים) סיבתי או איכותי (למשל, "מאביקי בעלי חיים מגבירים את יעילות ההפריה בפרחים"), מצב חדשני המשתמש ברקע זה, ושאלות הדורשות חשיבה לגבי השפעות הקשרים בפרחים. קטע הרקע בהקשר של המצב. גרסה זו משתמשת בערכות ניגודיות. ערכות הערכה אלו הן הפרעות שנוצרו על ידי מומחים החורגות מהדפוסים הנפוצים במערך הנתונים המקורי.
גודל הורדה : 1.97 MiB
גודל מערך נתונים : 2.04 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	974
`'validation'`	974

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{lin-etal-2019-reasoning,
    title = "Reasoning Over Paragraph Effects in Situations",
    author = "Lin, Kevin  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Gardner, Matt",
    booktitle = "Proceedings of the 2nd Workshop on Machine Reading for Question Answering",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D19-5808",
    doi = "10.18653/v1/D19-5808",
    pages = "58--62",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/drop

תיאור תצורה : DROP הוא מדד QA במיקור המונים, שנוצר באופן יריב, שבו מערכת חייבת לפתור הפניות בשאלה, אולי למספר עמדות קלט, ולבצע עליהן פעולות בדידות (כגון הוספה, ספירה או מיון). פעולות אלה דורשות הבנה מקיפה הרבה יותר של התוכן של פסקאות ממה שהיה נחוץ עבור מערכי נתונים קודמים.
גודל הורדה : 105.18 MiB
גודל ערכת נתונים: 108.16 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	77,399
`'validation'`	9,536

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{dua-etal-2019-drop,
    title = "{DROP}: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs",
    author = "Dua, Dheeru  and
      Wang, Yizhong  and
      Dasigi, Pradeep  and
      Stanovsky, Gabriel  and
      Singh, Sameer  and
      Gardner, Matt",
    booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
    month = jun,
    year = "2019",
    address = "Minneapolis, Minnesota",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/N19-1246",
    doi = "10.18653/v1/N19-1246",
    pages = "2368--2378",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/mctest

תיאור תצורה : MCTest דורש ממכונות לענות על שאלות הבנת הנקרא מרובות ברירות על סיפורים בדיוניים, תוך התמודדות ישירה עם המטרה ברמה הגבוהה של הבנת מכונה בדומיין פתוח. הבנת הנקרא יכולה לבחון יכולות מתקדמות כמו חשיבה סיבתית והבנת העולם, אך עם זאת, בהיותה רב-ברירה, עדיין מספקת מדד ברור. בהיותו בדיוני, ניתן למצוא את התשובה בדרך כלל רק בסיפור עצמו. הסיפורים והשאלות מוגבלים בקפידה לאלה שילד צעיר יבין, מה שמפחית את הידע העולמי הנדרש למשימה.
גודל הורדה : 2.14 MiB
גודל ערכת נתונים : 2.20 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	1,480
`'validation'`	320

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{richardson-etal-2013-mctest,
    title = "{MCT}est: A Challenge Dataset for the Open-Domain Machine Comprehension of Text",
    author = "Richardson, Matthew  and
      Burges, Christopher J.C.  and
      Renshaw, Erin",
    booktitle = "Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing",
    month = oct,
    year = "2013",
    address = "Seattle, Washington, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D13-1020",
    pages = "193--203",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/mctest_corrected_the_separator

תיאור תצורה : MCTest דורש ממכונות לענות על שאלות הבנת הנקרא מרובות ברירות על סיפורים בדיוניים, תוך התמודדות ישירה עם המטרה ברמה הגבוהה של הבנת מכונה בדומיין פתוח. הבנת הנקרא יכולה לבחון יכולות מתקדמות כמו חשיבה סיבתית והבנת העולם, אך עם זאת, בהיותה רב-ברירה, עדיין מספקת מדד ברור. בהיותו בדיוני, ניתן למצוא את התשובה בדרך כלל רק בסיפור עצמו. הסיפורים והשאלות מוגבלים בקפידה לאלה שילד צעיר יבין, מה שמפחית את הידע העולמי הנדרש למשימה.
גודל הורדה : 2.15 MiB
גודל מערך נתונים : 2.21 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	1,480
`'validation'`	320

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{richardson-etal-2013-mctest,
    title = "{MCT}est: A Challenge Dataset for the Open-Domain Machine Comprehension of Text",
    author = "Richardson, Matthew  and
      Burges, Christopher J.C.  and
      Renshaw, Erin",
    booktitle = "Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing",
    month = oct,
    year = "2013",
    address = "Seattle, Washington, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D13-1020",
    pages = "193--203",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/multirc

תיאור תצורה : MultiRC הוא אתגר הבנת הנקרא שבו ניתן לענות על שאלות רק על ידי התחשבות במידע ממספר משפטים. שאלות ותשובות לאתגר זה התבקשו ואומתו באמצעות ניסוי מיקור המונים בן 4 שלבים. מערך הנתונים מכיל שאלות לפסקאות על פני 7 תחומים שונים (מדע בבית ספר יסודי, חדשות, מדריכי טיולים, סיפורים בדיוניים וכו') המביאים גיוון לשוני לטקסטים ולניסוחי השאלות.
גודל הורדה : 897.09 KiB
גודל מערך נתונים : 918.42 KiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	312
`'validation'`	312

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{khashabi-etal-2018-looking,
    title = "Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences",
    author = "Khashabi, Daniel  and
      Chaturvedi, Snigdha  and
      Roth, Michael  and
      Upadhyay, Shyam  and
      Roth, Dan",
    booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)",
    month = jun,
    year = "2018",
    address = "New Orleans, Louisiana",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/N18-1023",
    doi = "10.18653/v1/N18-1023",
    pages = "252--262",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/narrativeqa

תיאור תצורה : NarrativeQA הוא מערך נתונים בשפה האנגלית של סיפורים ושאלות מתאימות שנועד לבחון את הבנת הנקרא, במיוחד במסמכים ארוכים.
גודל הורדה : 308.28 MiB
גודל ערכת נתונים: 311.22 MiB
שמירה אוטומטית במטמון ( תיעוד ): לא
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	21,114
`'train'`	65,494
`'validation'`	6,922

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@article{kocisky-etal-2018-narrativeqa,
    title = "The {N}arrative{QA} Reading Comprehension Challenge",
    author = "Ko{
{c} }isk{'y}, Tom{'a}{
{s} }  and
      Schwarz, Jonathan  and
      Blunsom, Phil  and
      Dyer, Chris  and
      Hermann, Karl Moritz  and
      Melis, G{'a}bor  and
      Grefenstette, Edward",
    journal = "Transactions of the Association for Computational Linguistics",
    volume = "6",
    year = "2018",
    address = "Cambridge, MA",
    publisher = "MIT Press",
    url = "https://aclanthology.org/Q18-1023",
    doi = "10.1162/tacl_a_00023",
    pages = "317--328",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/narrativeqa_dev

תיאור תצורה : NarrativeQA הוא מערך נתונים בשפה האנגלית של סיפורים ושאלות מתאימות שנועד לבחון את הבנת הנקרא, במיוחד במסמכים ארוכים.
גודל הורדה : 308.28 MiB
גודל ערכת נתונים: 311.22 MiB
שמירה אוטומטית במטמון ( תיעוד ): לא
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	21,114
`'train'`	65,494
`'validation'`	6,922

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@article{kocisky-etal-2018-narrativeqa,
    title = "The {N}arrative{QA} Reading Comprehension Challenge",
    author = "Ko{
{c} }isk{'y}, Tom{'a}{
{s} }  and
      Schwarz, Jonathan  and
      Blunsom, Phil  and
      Dyer, Chris  and
      Hermann, Karl Moritz  and
      Melis, G{'a}bor  and
      Grefenstette, Edward",
    journal = "Transactions of the Association for Computational Linguistics",
    volume = "6",
    year = "2018",
    address = "Cambridge, MA",
    publisher = "MIT Press",
    url = "https://aclanthology.org/Q18-1023",
    doi = "10.1162/tacl_a_00023",
    pages = "317--328",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/natural_questions

תיאור תצורה : קורפוס NQ מכיל שאלות ממשתמשים אמיתיים, והוא דורש ממערכות QA לקרוא ולהבין מאמר שלם בוויקיפדיה שאולי יכיל את התשובה לשאלה ואולי לא. הכללת שאלות אמיתיות של משתמשים, והדרישה שהפתרונות צריכים לקרוא עמוד שלם כדי למצוא את התשובה, גורמות ל-NQ להיות משימה מציאותית ומאתגרת יותר ממערכי נתונים קודמים של QA.
גודל הורדה : 6.95 MiB
גודל ערכת נתונים : 9.88 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	96,075
`'validation'`	2,295

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@article{kwiatkowski-etal-2019-natural,
    title = "Natural Questions: A Benchmark for Question Answering Research",
    author = "Kwiatkowski, Tom  and
      Palomaki, Jennimaria  and
      Redfield, Olivia  and
      Collins, Michael  and
      Parikh, Ankur  and
      Alberti, Chris  and
      Epstein, Danielle  and
      Polosukhin, Illia  and
      Devlin, Jacob  and
      Lee, Kenton  and
      Toutanova, Kristina  and
      Jones, Llion  and
      Kelcey, Matthew  and
      Chang, Ming-Wei  and
      Dai, Andrew M.  and
      Uszkoreit, Jakob  and
      Le, Quoc  and
      Petrov, Slav",
    journal = "Transactions of the Association for Computational Linguistics",
    volume = "7",
    year = "2019",
    address = "Cambridge, MA",
    publisher = "MIT Press",
    url = "https://aclanthology.org/Q19-1026",
    doi = "10.1162/tacl_a_00276",
    pages = "452--466",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/natural_questions_direct_ans

תיאור תצורה : קורפוס NQ מכיל שאלות ממשתמשים אמיתיים, והוא דורש ממערכות QA לקרוא ולהבין מאמר שלם בוויקיפדיה שאולי יכיל את התשובה לשאלה ואולי לא. הכללת שאלות אמיתיות של משתמשים, והדרישה שהפתרונות צריכים לקרוא עמוד שלם כדי למצוא את התשובה, גורמות ל-NQ להיות משימה מציאותית ומאתגרת יותר ממערכי נתונים קודמים של QA. גרסה זו מורכבת משאלות בתשובה ישירה.
גודל הורדה : 6.82 MiB
גודל מערך נתונים : 10.19 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	6,468
`'train'`	96,676
`'validation'`	10,693

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@article{kwiatkowski-etal-2019-natural,
    title = "Natural Questions: A Benchmark for Question Answering Research",
    author = "Kwiatkowski, Tom  and
      Palomaki, Jennimaria  and
      Redfield, Olivia  and
      Collins, Michael  and
      Parikh, Ankur  and
      Alberti, Chris  and
      Epstein, Danielle  and
      Polosukhin, Illia  and
      Devlin, Jacob  and
      Lee, Kenton  and
      Toutanova, Kristina  and
      Jones, Llion  and
      Kelcey, Matthew  and
      Chang, Ming-Wei  and
      Dai, Andrew M.  and
      Uszkoreit, Jakob  and
      Le, Quoc  and
      Petrov, Slav",
    journal = "Transactions of the Association for Computational Linguistics",
    volume = "7",
    year = "2019",
    address = "Cambridge, MA",
    publisher = "MIT Press",
    url = "https://aclanthology.org/Q19-1026",
    doi = "10.1162/tacl_a_00276",
    pages = "452--466",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/natural_questions_direct_ans_test

תיאור תצורה : קורפוס NQ מכיל שאלות ממשתמשים אמיתיים, והוא דורש ממערכות QA לקרוא ולהבין מאמר שלם בוויקיפדיה שאולי יכיל את התשובה לשאלה ואולי לא. הכללת שאלות אמיתיות של משתמשים, והדרישה שהפתרונות צריכים לקרוא עמוד שלם כדי למצוא את התשובה, גורמות ל-NQ להיות משימה מציאותית ומאתגרת יותר ממערכי נתונים קודמים של QA. גרסה זו מורכבת משאלות בתשובה ישירה.
גודל הורדה : 6.82 MiB
גודל מערך נתונים : 10.19 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	6,468
`'train'`	96,676
`'validation'`	10,693

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@article{kwiatkowski-etal-2019-natural,
    title = "Natural Questions: A Benchmark for Question Answering Research",
    author = "Kwiatkowski, Tom  and
      Palomaki, Jennimaria  and
      Redfield, Olivia  and
      Collins, Michael  and
      Parikh, Ankur  and
      Alberti, Chris  and
      Epstein, Danielle  and
      Polosukhin, Illia  and
      Devlin, Jacob  and
      Lee, Kenton  and
      Toutanova, Kristina  and
      Jones, Llion  and
      Kelcey, Matthew  and
      Chang, Ming-Wei  and
      Dai, Andrew M.  and
      Uszkoreit, Jakob  and
      Le, Quoc  and
      Petrov, Slav",
    journal = "Transactions of the Association for Computational Linguistics",
    volume = "7",
    year = "2019",
    address = "Cambridge, MA",
    publisher = "MIT Press",
    url = "https://aclanthology.org/Q19-1026",
    doi = "10.1162/tacl_a_00276",
    pages = "452--466",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/natural_questions_with_dpr_para

תיאור תצורה : קורפוס NQ מכיל שאלות ממשתמשים אמיתיים, והוא דורש ממערכות QA לקרוא ולהבין מאמר שלם בוויקיפדיה שאולי יכיל את התשובה לשאלה ואולי לא. הכללת שאלות אמיתיות של משתמשים, והדרישה שהפתרונות צריכים לקרוא עמוד שלם כדי למצוא את התשובה, גורמות ל-NQ להיות משימה מציאותית ומאתגרת יותר ממערכי נתונים קודמים של QA. גרסה זו כוללת פסקאות נוספות (שהושגו באמצעות מנוע האחזור DPR) כדי להגדיל כל שאלה.
גודל הורדה : 319.22 MiB
גודל ערכת נתונים: 322.91 MiB
שמירה אוטומטית במטמון ( תיעוד ): לא
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	96,676
`'validation'`	10,693

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@article{kwiatkowski-etal-2019-natural,
    title = "Natural Questions: A Benchmark for Question Answering Research",
    author = "Kwiatkowski, Tom  and
      Palomaki, Jennimaria  and
      Redfield, Olivia  and
      Collins, Michael  and
      Parikh, Ankur  and
      Alberti, Chris  and
      Epstein, Danielle  and
      Polosukhin, Illia  and
      Devlin, Jacob  and
      Lee, Kenton  and
      Toutanova, Kristina  and
      Jones, Llion  and
      Kelcey, Matthew  and
      Chang, Ming-Wei  and
      Dai, Andrew M.  and
      Uszkoreit, Jakob  and
      Le, Quoc  and
      Petrov, Slav",
    journal = "Transactions of the Association for Computational Linguistics",
    volume = "7",
    year = "2019",
    address = "Cambridge, MA",
    publisher = "MIT Press",
    url = "https://aclanthology.org/Q19-1026",
    doi = "10.1162/tacl_a_00276",
    pages = "452--466",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/natural_questions_with_dpr_para_test

תיאור תצורה : קורפוס NQ מכיל שאלות ממשתמשים אמיתיים, והוא דורש ממערכות QA לקרוא ולהבין מאמר שלם בוויקיפדיה שאולי יכיל את התשובה לשאלה ואולי לא. הכללת שאלות אמיתיות של משתמשים, והדרישה שהפתרונות צריכים לקרוא עמוד שלם כדי למצוא את התשובה, גורמות ל-NQ להיות משימה מציאותית ומאתגרת יותר ממערכי נתונים קודמים של QA. גרסה זו כוללת פסקאות נוספות (שהושגו באמצעות מנוע האחזור DPR) כדי להגדיל כל שאלה.
גודל הורדה : 306.94 MiB
גודל ערכת נתונים: 310.48 MiB
שמירה אוטומטית במטמון ( תיעוד ): לא
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	6,468
`'train'`	96,676

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@article{kwiatkowski-etal-2019-natural,
    title = "Natural Questions: A Benchmark for Question Answering Research",
    author = "Kwiatkowski, Tom  and
      Palomaki, Jennimaria  and
      Redfield, Olivia  and
      Collins, Michael  and
      Parikh, Ankur  and
      Alberti, Chris  and
      Epstein, Danielle  and
      Polosukhin, Illia  and
      Devlin, Jacob  and
      Lee, Kenton  and
      Toutanova, Kristina  and
      Jones, Llion  and
      Kelcey, Matthew  and
      Chang, Ming-Wei  and
      Dai, Andrew M.  and
      Uszkoreit, Jakob  and
      Le, Quoc  and
      Petrov, Slav",
    journal = "Transactions of the Association for Computational Linguistics",
    volume = "7",
    year = "2019",
    address = "Cambridge, MA",
    publisher = "MIT Press",
    url = "https://aclanthology.org/Q19-1026",
    doi = "10.1162/tacl_a_00276",
    pages = "452--466",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/newsqa

תיאור תצורה : NewsQA הוא מערך נתונים מאתגר של הבנת מכונה של צמדי שאלות ותשובות שנוצרו על ידי אדם. עובדי ההמונים מספקים שאלות ותשובות המבוססות על סט מאמרי חדשות מ-CNN, עם תשובות המורכבות מטווחי טקסט מהמאמרים המקבילים.
גודל הורדה : 283.33 MiB
גודל ערכת נתונים: 285.94 MiB
שמירה אוטומטית במטמון ( תיעוד ): לא
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	75,882
`'validation'`	4,309

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{trischler-etal-2017-newsqa,
    title = "{N}ews{QA}: A Machine Comprehension Dataset",
    author = "Trischler, Adam  and
      Wang, Tong  and
      Yuan, Xingdi  and
      Harris, Justin  and
      Sordoni, Alessandro  and
      Bachman, Philip  and
      Suleman, Kaheer",
    booktitle = "Proceedings of the 2nd Workshop on Representation Learning for {NLP}",
    month = aug,
    year = "2017",
    address = "Vancouver, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W17-2623",
    doi = "10.18653/v1/W17-2623",
    pages = "191--200",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/openbookqa

תיאור תצורה : OpenBookQA שואפת לקדם מחקר במענה מתקדם לשאלות, תוך בדיקה מעמיקה יותר של הנושא (עם עובדות בולטות המסוכמות כספר פתוח, מסופק גם עם מערך הנתונים) וגם של השפה שבה הוא מתבטא. מכיל שאלות הדורשות חשיבה רב-שלבית, שימוש בידע נוסף נפוץ והגיוני והבנת טקסט עשיר. OpenBookQA הוא סוג חדש של מערך נתונים לתשובות לשאלות שעוצב לפי בחינות ספר פתוח להערכת הבנה אנושית של נושא.
גודל הורדה : 942.34 KiB
גודל מערך נתונים : 1.11 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	500
`'train'`	4,957
`'validation'`	500

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{mihaylov-etal-2018-suit,
    title = "Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering",
    author = "Mihaylov, Todor  and
      Clark, Peter  and
      Khot, Tushar  and
      Sabharwal, Ashish",
    booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
    month = oct # "-" # nov,
    year = "2018",
    address = "Brussels, Belgium",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D18-1260",
    doi = "10.18653/v1/D18-1260",
    pages = "2381--2391",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/openbookqa_dev

תיאור תצורה : OpenBookQA שואפת לקדם מחקר במענה מתקדם לשאלות, תוך בדיקה מעמיקה יותר של הנושא (עם עובדות בולטות המסוכמות כספר פתוח, מסופק גם עם מערך הנתונים) וגם של השפה שבה הוא מתבטא. מכיל שאלות הדורשות חשיבה רב-שלבית, שימוש בידע נוסף נפוץ והגיוני והבנת טקסט עשיר. OpenBookQA הוא סוג חדש של מערך נתונים לתשובות לשאלות שעוצב לפי בחינות ספר פתוח להערכת הבנה אנושית של נושא.
גודל הורדה : 942.34 KiB
גודל מערך נתונים : 1.11 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	500
`'train'`	4,957
`'validation'`	500

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{mihaylov-etal-2018-suit,
    title = "Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering",
    author = "Mihaylov, Todor  and
      Clark, Peter  and
      Khot, Tushar  and
      Sabharwal, Ashish",
    booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
    month = oct # "-" # nov,
    year = "2018",
    address = "Brussels, Belgium",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D18-1260",
    doi = "10.18653/v1/D18-1260",
    pages = "2381--2391",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/openbookqa_with_ir

תיאור תצורה : OpenBookQA שואפת לקדם מחקר במענה מתקדם לשאלות, תוך בדיקה מעמיקה יותר של הנושא (עם עובדות בולטות המסוכמות כספר פתוח, מסופק גם עם מערך הנתונים) וגם של השפה שבה הוא מתבטא. מכיל שאלות הדורשות חשיבה רב-שלבית, שימוש בידע נוסף נפוץ והגיוני והבנת טקסט עשיר. OpenBookQA הוא סוג חדש של מערך נתונים לתשובות לשאלות שעוצב לפי בחינות ספר פתוח להערכת הבנה אנושית של נושא. גרסה זו כוללת פסקאות שנשלפו באמצעות מערכת אחזור מידע כראיה נוספת.
גודל הורדה : 6.08 MiB
גודל ערכת נתונים : 6.28 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	500
`'train'`	4,957
`'validation'`	500

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{mihaylov-etal-2018-suit,
    title = "Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering",
    author = "Mihaylov, Todor  and
      Clark, Peter  and
      Khot, Tushar  and
      Sabharwal, Ashish",
    booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
    month = oct # "-" # nov,
    year = "2018",
    address = "Brussels, Belgium",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D18-1260",
    doi = "10.18653/v1/D18-1260",
    pages = "2381--2391",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/openbookqa_with_ir_dev

תיאור תצורה : OpenBookQA שואפת לקדם מחקר במענה מתקדם לשאלות, תוך בדיקה מעמיקה יותר של הנושא (עם עובדות בולטות המסוכמות כספר פתוח, מסופק גם עם מערך הנתונים) וגם של השפה שבה הוא מתבטא. מכיל שאלות הדורשות חשיבה רב-שלבית, שימוש בידע נוסף נפוץ והגיוני והבנת טקסט עשיר. OpenBookQA הוא סוג חדש של מערך נתונים לתשובות לשאלות שעוצב לפי בחינות ספר פתוח להערכת הבנה אנושית של נושא. גרסה זו כוללת פסקאות שנשלפו באמצעות מערכת אחזור מידע כראיה נוספת.
גודל הורדה : 6.08 MiB
גודל ערכת נתונים : 6.28 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	500
`'train'`	4,957
`'validation'`	500

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{mihaylov-etal-2018-suit,
    title = "Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering",
    author = "Mihaylov, Todor  and
      Clark, Peter  and
      Khot, Tushar  and
      Sabharwal, Ashish",
    booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
    month = oct # "-" # nov,
    year = "2018",
    address = "Brussels, Belgium",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D18-1260",
    doi = "10.18653/v1/D18-1260",
    pages = "2381--2391",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/physical_iqa

תיאור תצורה : זהו מערך נתונים לבחינת התקדמות בהבנת הגיון הפיזי. המשימה הבסיסית היא תשובה לשאלות מרובות בחירה: בהינתן שאלה ש' ושני פתרונות אפשריים s1, s2, על מודל או אדם לבחור את הפתרון המתאים ביותר, שאחד מהם נכון. מערך הנתונים מתמקד במצבים יומיומיים עם העדפה לפתרונות לא טיפוסיים. מערך הנתונים נוצר בהשראת instructables.com, המספק למשתמשים הוראות כיצד לבנות, ליצור, לאפות או לתפעל חפצים באמצעות חומרים יומיומיים. כותבים מתבקשים לספק הפרעות סמנטיות או גישות אלטרנטיביות הדומות אחרת מבחינה תחבירית ואקטואלית כדי להבטיח שהידע הפיזי ממוקד. מערך הנתונים עובר ניקוי נוסף מחפצים בסיסיים באמצעות אלגוריתם AFLite.
גודל הורדה : 6.01 MiB
גודל מערך נתונים : 6.59 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	16,113
`'validation'`	1,838

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{bisk2020piqa,
    title={Piqa: Reasoning about physical commonsense in natural language},
    author={Bisk, Yonatan and Zellers, Rowan and Gao, Jianfeng and Choi, Yejin and others},
    booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
    volume={34},
    number={05},
    pages={7432--7439},
    year={2020}
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/qasc

תיאור תצורה : QASC הוא מערך נתונים המענה על שאלות עם התמקדות בהרכב המשפטים. הוא מורכב משאלות רב-ברירה בעלות 8 כיוונים על מדעי בית הספר היסודי, ומגיע עם קורפוס של 17 מיליון משפטים.
גודל הורדה : 1.75 MiB
גודל מערך נתונים : 2.09 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	920
`'train'`	8,134
`'validation'`	926

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{khot2020qasc,
    title={Qasc: A dataset for question answering via sentence composition},
    author={Khot, Tushar and Clark, Peter and Guerquin, Michal and Jansen, Peter and Sabharwal, Ashish},
    booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
    volume={34},
    number={05},
    pages={8082--8090},
    year={2020}
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/qasc_test

תיאור תצורה : QASC הוא מערך נתונים המענה על שאלות עם התמקדות בהרכב המשפטים. הוא מורכב משאלות רב-ברירה בעלות 8 כיוונים על מדעי בית הספר היסודי, ומגיע עם קורפוס של 17 מיליון משפטים.
גודל הורדה : 1.75 MiB
גודל מערך נתונים : 2.09 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	920
`'train'`	8,134
`'validation'`	926

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{khot2020qasc,
    title={Qasc: A dataset for question answering via sentence composition},
    author={Khot, Tushar and Clark, Peter and Guerquin, Michal and Jansen, Peter and Sabharwal, Ashish},
    booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
    volume={34},
    number={05},
    pages={8082--8090},
    year={2020}
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/qasc_with_ir

תיאור תצורה : QASC הוא מערך נתונים המענה על שאלות עם התמקדות בהרכב המשפטים. הוא מורכב משאלות רב-ברירה בעלות 8 כיוונים על מדעי בית הספר היסודי, ומגיע עם קורפוס של 17 מיליון משפטים. גרסה זו כוללת פסקאות שנשלפו באמצעות מערכת אחזור מידע כראיה נוספת.
גודל הורדה : 16.95 MiB
גודל ערכת נתונים : 17.30 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	920
`'train'`	8,134
`'validation'`	926

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{khot2020qasc,
    title={Qasc: A dataset for question answering via sentence composition},
    author={Khot, Tushar and Clark, Peter and Guerquin, Michal and Jansen, Peter and Sabharwal, Ashish},
    booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
    volume={34},
    number={05},
    pages={8082--8090},
    year={2020}
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/qasc_with_ir_test

תיאור תצורה : QASC הוא מערך נתונים המענה על שאלות עם התמקדות בהרכב המשפטים. הוא מורכב משאלות רב-ברירה בעלות 8 כיוונים על מדעי בית הספר היסודי, ומגיע עם קורפוס של 17 מיליון משפטים. גרסה זו כוללת פסקאות שנשלפו באמצעות מערכת אחזור מידע כראיה נוספת.
גודל הורדה : 16.95 MiB
גודל ערכת נתונים : 17.30 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	920
`'train'`	8,134
`'validation'`	926

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{khot2020qasc,
    title={Qasc: A dataset for question answering via sentence composition},
    author={Khot, Tushar and Clark, Peter and Guerquin, Michal and Jansen, Peter and Sabharwal, Ashish},
    booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
    volume={34},
    number={05},
    pages={8082--8090},
    year={2020}
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/quoref

תיאור תצורה : מערך נתונים זה בודק את יכולת החשיבה הבסיסית של מערכות הבנת הנקרא. במבחן טווח בחירה זה המכיל שאלות על פסקאות מויקיפדיה, מערכת חייבת לפתור את ההפניות הקשות לפני בחירת הטווח המתאים בפסקאות למענה על שאלות.
גודל הורדה : 51.43 MiB
גודל מערך נתונים : 52.29 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	22,265
`'validation'`	2,768

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{dasigi-etal-2019-quoref,
    title = "{Q}uoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning",
    author = "Dasigi, Pradeep  and
      Liu, Nelson F.  and
      Marasovi{'c}, Ana  and
      Smith, Noah A.  and
      Gardner, Matt",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D19-1606",
    doi = "10.18653/v1/D19-1606",
    pages = "5925--5932",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/race_string

תיאור תצורה : גזע הוא מערך נתונים בקנה מידה גדול של הבנת הנקרא. מערך הנתונים נאסף מבחינות אנגלית בסין, המיועדות לתלמידי חטיבת ביניים ותיכון. ניתן להגיש את מערך הנתונים כמערכי ההדרכה והבדיקה להבנת מכונה.
גודל הורדה : 167.97 MiB
גודל ערכת נתונים: 171.23 MiB
שמור אוטומטי במטמון ( תיעוד ): כן (בדיקה, אימות), רק כאשר shuffle_files=False (רכבת)
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	4,934
`'train'`	87,863
`'validation'`	4,887

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{lai-etal-2017-race,
    title = "{RACE}: Large-scale {R}e{A}ding Comprehension Dataset From Examinations",
    author = "Lai, Guokun  and
      Xie, Qizhe  and
      Liu, Hanxiao  and
      Yang, Yiming  and
      Hovy, Eduard",
    booktitle = "Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing",
    month = sep,
    year = "2017",
    address = "Copenhagen, Denmark",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D17-1082",
    doi = "10.18653/v1/D17-1082",
    pages = "785--794",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/race_string_dev

תיאור תצורה : גזע הוא מערך נתונים בקנה מידה גדול של הבנת הנקרא. מערך הנתונים נאסף מבחינות אנגלית בסין, המיועדות לתלמידי חטיבת ביניים ותיכון. ניתן להגיש את מערך הנתונים כמערכי ההדרכה והבדיקה להבנת מכונה.
גודל הורדה : 167.97 MiB
גודל ערכת נתונים: 171.23 MiB
שמור אוטומטי במטמון ( תיעוד ): כן (בדיקה, אימות), רק כאשר shuffle_files=False (רכבת)
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	4,934
`'train'`	87,863
`'validation'`	4,887

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{lai-etal-2017-race,
    title = "{RACE}: Large-scale {R}e{A}ding Comprehension Dataset From Examinations",
    author = "Lai, Guokun  and
      Xie, Qizhe  and
      Liu, Hanxiao  and
      Yang, Yiming  and
      Hovy, Eduard",
    booktitle = "Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing",
    month = sep,
    year = "2017",
    address = "Copenhagen, Denmark",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D17-1082",
    doi = "10.18653/v1/D17-1082",
    pages = "785--794",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/ropes

תיאור תצורה : מערך נתונים זה בודק את יכולתה של מערכת ליישם ידע ממעבר טקסט למצב חדש. מערכת מוצגת קטע רקע המכיל קשר(ים) סיבתי או איכותי (למשל, "מאביקי בעלי חיים מגבירים את יעילות ההפריה בפרחים"), מצב חדשני המשתמש ברקע זה, ושאלות הדורשות חשיבה לגבי השפעות הקשרים בפרחים. קטע הרקע בהקשר של המצב.
גודל הורדה : 12.91 MiB
גודל מערך נתונים : 13.35 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	10,924
`'validation'`	1,688

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{lin-etal-2019-reasoning,
    title = "Reasoning Over Paragraph Effects in Situations",
    author = "Lin, Kevin  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Gardner, Matt",
    booktitle = "Proceedings of the 2nd Workshop on Machine Reading for Question Answering",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D19-5808",
    doi = "10.18653/v1/D19-5808",
    pages = "58--62",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/social_iqa

תיאור תצורה : זהו אמת מידה בקנה מידה גדול להיגיון בריא לגבי מצבים חברתיים. IQa חברתי מכיל שאלות רב-ברירות לבדיקת אינטליגנציה רגשית וחברתית במגוון מצבים יומיומיים. באמצעות מיקור המונים נאספות שאלות נכונות ותשובות נכונות ולא נכונות על אינטראקציות חברתיות, תוך שימוש במסגרת חדשה המפחיתה חפצים סגנוניים בתשובות שגויות על ידי בקשה מהעובדים לספק את התשובה הנכונה לשאלה אחרת אך קשורה.
גודל הורדה : 7.08 MiB
גודל מערך נתונים : 8.22 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	33,410
`'validation'`	1,954

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{sap-etal-2019-social,
    title = "Social {IQ}a: Commonsense Reasoning about Social Interactions",
    author = "Sap, Maarten  and
      Rashkin, Hannah  and
      Chen, Derek  and
      Le Bras, Ronan  and
      Choi, Yejin",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D19-1454",
    doi = "10.18653/v1/D19-1454",
    pages = "4463--4473",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/squad1_1

תיאור תצורה : זהו מערך נתונים של הבנת הנקרא המורכב משאלות שהעלו עובדי המונים על קבוצה של מאמרים בוויקיפדיה, כאשר התשובה לכל שאלה היא קטע טקסט מקטע הקריאה המתאים.
גודל הורדה : 80.62 MiB
גודל מערך נתונים : 83.99 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	87,514
`'validation'`	10,570

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{rajpurkar-etal-2016-squad,
    title = "{SQ}u{AD}: 100,000+ Questions for Machine Comprehension of Text",
    author = "Rajpurkar, Pranav  and
      Zhang, Jian  and
      Lopyrev, Konstantin  and
      Liang, Percy",
    booktitle = "Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2016",
    address = "Austin, Texas",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D16-1264",
    doi = "10.18653/v1/D16-1264",
    pages = "2383--2392",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/squad2

תיאור תצורה : מערך נתונים זה משלב את מערך הנתונים המקורי של Stanford Question Answering Dataset (SQuAD) עם שאלות בלתי ניתנות לתשובה שנכתבו באופן יריב על ידי עובדי המונים כדי להיראות דומים לשאלות הניתנות לתשובה.
גודל הורדה : 116.56 MiB
גודל ערכת נתונים: 121.43 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	130,149
`'validation'`	11,873

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{rajpurkar-etal-2018-know,
    title = "Know What You Don{'}t Know: Unanswerable Questions for {SQ}u{AD}",
    author = "Rajpurkar, Pranav  and
      Jia, Robin  and
      Liang, Percy",
    booktitle = "Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)",
    month = jul,
    year = "2018",
    address = "Melbourne, Australia",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/P18-2124",
    doi = "10.18653/v1/P18-2124",
    pages = "784--789",
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/winogrande_l

תיאור תצורה : מערך נתונים זה נוצר בהשראת העיצוב המקורי של Winograd Schema Challenge, אך מותאם כדי לשפר הן את קנה המידה והן את הקשיות של מערך הנתונים. השלבים העיקריים של בניית מערך הנתונים כוללים (1) הליך מיקור המונים שתוכנן בקפידה, ולאחריו (2) הפחתת הטיה שיטתית באמצעות אלגוריתם AfLite חדש שמכליל אסוציאציות של מילים הניתנות לזיהוי אנושי לאסוציאציות הטבעה הניתנות לזיהוי באמצעות מכונה. מסופקים ערכות אימון בגדלים שונים. סט זה מתאים לגודל l .
גודל הורדה : 1.49 MiB
גודל מערך נתונים : 1.83 MiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	10,234
`'validation'`	1,267

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{sakaguchi2020winogrande,
  title={Winogrande: An adversarial winograd schema challenge at scale},
  author={Sakaguchi, Keisuke and Le Bras, Ronan and Bhagavatula, Chandra and Choi, Yejin},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={34},
  number={05},
  pages={8732--8740},
  year={2020}
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/winogrande_m

תיאור תצורה : מערך נתונים זה נוצר בהשראת העיצוב המקורי של Winograd Schema Challenge, אך מותאם כדי לשפר הן את קנה המידה והן את הקשיות של מערך הנתונים. השלבים העיקריים של בניית מערך הנתונים כוללים (1) הליך מיקור המונים שתוכנן בקפידה, ולאחריו (2) הפחתת הטיה שיטתית באמצעות אלגוריתם AfLite חדש שמכליל אסוציאציות של מילים הניתנות לזיהוי אנושי לאסוציאציות הטבעה הניתנות לזיהוי באמצעות מכונה. מסופקים ערכות אימון בגדלים שונים. סט זה מתאים לגודל m .
גודל הורדה : 507.46 KiB
גודל ערכת נתונים: 623.15 KiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'train'`	2,558
`'validation'`	1,267

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{sakaguchi2020winogrande,
  title={Winogrande: An adversarial winograd schema challenge at scale},
  author={Sakaguchi, Keisuke and Le Bras, Ronan and Bhagavatula, Chandra and Choi, Yejin},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={34},
  number={05},
  pages={8732--8740},
  year={2020}
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa/winogrande_s

תיאור תצורה : מערך נתונים זה נוצר בהשראת העיצוב המקורי של Winograd Schema Challenge, אך מותאם כדי לשפר הן את קנה המידה והן את הקשיות של מערך הנתונים. השלבים העיקריים של בניית מערך הנתונים כוללים (1) הליך מיקור המונים שתוכנן בקפידה, ולאחריו (2) הפחתת הטיה שיטתית באמצעות אלגוריתם AfLite חדש שמכליל אסוציאציות של מילים הניתנות לזיהוי אנושי לאסוציאציות הטבעה הניתנות לזיהוי באמצעות מכונה. מסופקים ערכות אימון בגדלים שונים. סט זה מתאים לגודל s .
גודל הורדה : 479.24 KiB
גודל מערך נתונים : 590.47 KiB
שמור אוטומטי במטמון ( תיעוד ): כן
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	1,767
`'train'`	640
`'validation'`	1,267

דוגמאות ( tfds.as_dataframe ):

ציטוט :

@inproceedings{sakaguchi2020winogrande,
  title={Winogrande: An adversarial winograd schema challenge at scale},
  author={Sakaguchi, Keisuke and Le Bras, Ronan and Bhagavatula, Chandra and Choi, Yejin},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={34},
  number={05},
  pages={8732--8740},
  year={2020}
}

@inproceedings{khashabi-etal-2020-unifiedqa,
    title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
    author = "Khashabi, Daniel  and
      Min, Sewon  and
      Khot, Tushar  and
      Sabharwal, Ashish  and
      Tafjord, Oyvind  and
      Clark, Peter  and
      Hajishirzi, Hannaneh",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.171",
    doi = "10.18653/v1/2020.findings-emnlp.171",
    pages = "1896--1907",
}

Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."

unified_qa קל לארגן דפים בעזרת אוספים אפשר לשמור ולסווג תוכן על סמך ההעדפות שלך.