- Deskripsi :
Tolok ukur UnifiedQA terdiri dari 20 kumpulan data penjawab pertanyaan (QA) utama (masing-masing mungkin memiliki beberapa versi) yang menargetkan format yang berbeda serta berbagai fenomena linguistik yang kompleks. Kumpulan data ini dikelompokkan ke dalam beberapa format/kategori, antara lain: QA ekstraktif, QA abstraktif, QA pilihan ganda, dan QA ya/tidak. Selain itu, set kontras digunakan untuk beberapa set data (dilambangkan dengan " set kontras"). Set evaluasi ini adalah gangguan yang dibuat oleh pakar yang menyimpang dari pola yang umum dalam kumpulan data asli. Untuk beberapa kumpulan data yang tidak dilengkapi dengan paragraf bukti, dua varian disertakan: satu di mana kumpulan data digunakan apa adanya dan yang lain menggunakan paragraf yang diambil melalui sistem pencarian informasi sebagai bukti tambahan, yang ditandai dengan tag "_ir".
Informasi lebih lanjut dapat ditemukan di: https://github.com/alllenai/unifiedqa
Beranda : https://github.com/allenai/unifiedqa
Kode sumber :
tfds.text.unifiedqa.UnifiedQA
Versi :
-
1.0.0
(default): Rilis awal.
-
Struktur fitur :
FeaturesDict({
'input': string,
'output': string,
})
- Dokumentasi fitur :
Fitur | Kelas | Membentuk | Dtype | Keterangan |
---|---|---|---|---|
fiturDict | ||||
memasukkan | Tensor | rangkaian | ||
keluaran | Tensor | rangkaian |
Kunci yang diawasi (Lihat
as_supervised
doc ):None
Gambar ( tfds.show_examples ): Tidak didukung.
unified_qa/ai2_science_elementary (konfigurasi default)
Deskripsi konfigurasi : Dataset AI2 Science Questions terdiri dari pertanyaan yang digunakan dalam penilaian siswa di Amerika Serikat untuk semua tingkat sekolah dasar dan menengah. Setiap pertanyaan adalah format pilihan ganda 4 arah dan mungkin menyertakan atau tidak menyertakan elemen diagram. Set ini terdiri dari soal-soal yang digunakan untuk tingkat kelas sekolah dasar.
Ukuran unduhan :
345.59 KiB
Ukuran dataset :
390.02 KiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'test' | 542 |
'train' | 623 |
'validation' | 123 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
http://data.allenai.org/ai2-science-questions
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/ai2_science_middle
Deskripsi konfigurasi : Dataset AI2 Science Questions terdiri dari pertanyaan yang digunakan dalam penilaian siswa di Amerika Serikat untuk semua tingkat sekolah dasar dan menengah. Setiap pertanyaan adalah format pilihan ganda 4 arah dan mungkin menyertakan atau tidak menyertakan elemen diagram. Set ini terdiri dari pertanyaan yang digunakan untuk tingkat kelas sekolah menengah.
Ukuran unduhan :
428.41 KiB
Ukuran dataset :
477.40 KiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'test' | 679 |
'train' | 605 |
'validation' | 125 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
http://data.allenai.org/ai2-science-questions
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/ambigqa
Deskripsi konfigurasi : AmbigQA adalah tugas menjawab pertanyaan domain terbuka yang melibatkan menemukan setiap jawaban yang masuk akal, dan kemudian menulis ulang pertanyaan untuk masing-masing jawaban untuk menyelesaikan ambiguitas.
Ukuran unduhan :
2.27 MiB
Ukuran dataset :
3.04 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 19.806 |
'validation' | 5.674 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{min-etal-2020-ambigqa,
title = "{A}mbig{QA}: Answering Ambiguous Open-domain Questions",
author = "Min, Sewon and
Michael, Julian and
Hajishirzi, Hannaneh and
Zettlemoyer, Luke",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.emnlp-main.466",
doi = "10.18653/v1/2020.emnlp-main.466",
pages = "5783--5797",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/arc_easy
Deskripsi konfigurasi : Kumpulan data ini terdiri dari pertanyaan sains pilihan ganda tingkat sekolah dasar asli, yang dikumpulkan untuk mendorong penelitian dalam menjawab pertanyaan tingkat lanjut. Dataset dipartisi menjadi Kumpulan Tantangan dan Kumpulan Mudah, di mana yang pertama hanya berisi pertanyaan yang dijawab salah oleh algoritme berbasis pengambilan dan algoritme kejadian bersama kata. Set ini terdiri dari pertanyaan "mudah".
Ukuran unduhan :
1.24 MiB
Ukuran dataset :
1.42 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'test' | 2.376 |
'train' | 2.251 |
'validation' | 570 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@article{clark2018think,
title={Think you have solved question answering? try arc, the ai2 reasoning challenge},
author={Clark, Peter and Cowhey, Isaac and Etzioni, Oren and Khot, Tushar and Sabharwal, Ashish and Schoenick, Carissa and Tafjord, Oyvind},
journal={arXiv preprint arXiv:1803.05457},
year={2018}
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/arc_easy_dev
Deskripsi konfigurasi : Kumpulan data ini terdiri dari pertanyaan sains pilihan ganda tingkat sekolah dasar asli, yang dikumpulkan untuk mendorong penelitian dalam menjawab pertanyaan tingkat lanjut. Dataset dipartisi menjadi Kumpulan Tantangan dan Kumpulan Mudah, di mana yang pertama hanya berisi pertanyaan yang dijawab salah oleh algoritme berbasis pengambilan dan algoritme kejadian bersama kata. Set ini terdiri dari pertanyaan "mudah".
Ukuran unduhan :
1.24 MiB
Ukuran dataset :
1.42 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'test' | 2.376 |
'train' | 2.251 |
'validation' | 570 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@article{clark2018think,
title={Think you have solved question answering? try arc, the ai2 reasoning challenge},
author={Clark, Peter and Cowhey, Isaac and Etzioni, Oren and Khot, Tushar and Sabharwal, Ashish and Schoenick, Carissa and Tafjord, Oyvind},
journal={arXiv preprint arXiv:1803.05457},
year={2018}
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/arc_easy_with_ir
Deskripsi konfigurasi : Kumpulan data ini terdiri dari pertanyaan sains pilihan ganda tingkat sekolah dasar asli, yang dikumpulkan untuk mendorong penelitian dalam menjawab pertanyaan tingkat lanjut. Dataset dipartisi menjadi Kumpulan Tantangan dan Kumpulan Mudah, di mana yang pertama hanya berisi pertanyaan yang dijawab salah oleh algoritme berbasis pengambilan dan algoritme kejadian bersama kata. Set ini terdiri dari pertanyaan "mudah". Versi ini menyertakan paragraf yang diambil melalui sistem pencarian informasi sebagai bukti tambahan.
Ukuran unduhan :
7.00 MiB
Ukuran dataset :
7.17 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'test' | 2.376 |
'train' | 2.251 |
'validation' | 570 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@article{clark2018think,
title={Think you have solved question answering? try arc, the ai2 reasoning challenge},
author={Clark, Peter and Cowhey, Isaac and Etzioni, Oren and Khot, Tushar and Sabharwal, Ashish and Schoenick, Carissa and Tafjord, Oyvind},
journal={arXiv preprint arXiv:1803.05457},
year={2018}
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/arc_easy_with_ir_dev
Deskripsi konfigurasi : Kumpulan data ini terdiri dari pertanyaan sains pilihan ganda tingkat sekolah dasar asli, yang dikumpulkan untuk mendorong penelitian dalam menjawab pertanyaan tingkat lanjut. Dataset dipartisi menjadi Kumpulan Tantangan dan Kumpulan Mudah, di mana yang pertama hanya berisi pertanyaan yang dijawab salah oleh algoritme berbasis pengambilan dan algoritme kejadian bersama kata. Set ini terdiri dari pertanyaan "mudah". Versi ini menyertakan paragraf yang diambil melalui sistem pencarian informasi sebagai bukti tambahan.
Ukuran unduhan :
7.00 MiB
Ukuran dataset :
7.17 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'test' | 2.376 |
'train' | 2.251 |
'validation' | 570 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@article{clark2018think,
title={Think you have solved question answering? try arc, the ai2 reasoning challenge},
author={Clark, Peter and Cowhey, Isaac and Etzioni, Oren and Khot, Tushar and Sabharwal, Ashish and Schoenick, Carissa and Tafjord, Oyvind},
journal={arXiv preprint arXiv:1803.05457},
year={2018}
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/arc_hard
Deskripsi konfigurasi : Kumpulan data ini terdiri dari pertanyaan sains pilihan ganda tingkat sekolah dasar asli, yang dikumpulkan untuk mendorong penelitian dalam menjawab pertanyaan tingkat lanjut. Dataset dipartisi menjadi Kumpulan Tantangan dan Kumpulan Mudah, di mana yang pertama hanya berisi pertanyaan yang dijawab salah oleh algoritme berbasis pengambilan dan algoritme kejadian bersama kata. Set ini terdiri dari pertanyaan "sulit".
Ukuran unduhan :
758.03 KiB
Ukuran dataset :
848.28 KiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'test' | 1.172 |
'train' | 1.119 |
'validation' | 299 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@article{clark2018think,
title={Think you have solved question answering? try arc, the ai2 reasoning challenge},
author={Clark, Peter and Cowhey, Isaac and Etzioni, Oren and Khot, Tushar and Sabharwal, Ashish and Schoenick, Carissa and Tafjord, Oyvind},
journal={arXiv preprint arXiv:1803.05457},
year={2018}
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/arc_hard_dev
Deskripsi konfigurasi : Kumpulan data ini terdiri dari pertanyaan sains pilihan ganda tingkat sekolah dasar asli, yang dikumpulkan untuk mendorong penelitian dalam menjawab pertanyaan tingkat lanjut. Dataset dipartisi menjadi Kumpulan Tantangan dan Kumpulan Mudah, di mana yang pertama hanya berisi pertanyaan yang dijawab salah oleh algoritme berbasis pengambilan dan algoritme kejadian bersama kata. Set ini terdiri dari pertanyaan "sulit".
Ukuran unduhan :
758.03 KiB
Ukuran dataset :
848.28 KiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'test' | 1.172 |
'train' | 1.119 |
'validation' | 299 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@article{clark2018think,
title={Think you have solved question answering? try arc, the ai2 reasoning challenge},
author={Clark, Peter and Cowhey, Isaac and Etzioni, Oren and Khot, Tushar and Sabharwal, Ashish and Schoenick, Carissa and Tafjord, Oyvind},
journal={arXiv preprint arXiv:1803.05457},
year={2018}
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/arc_hard_with_ir
Deskripsi konfigurasi : Kumpulan data ini terdiri dari pertanyaan sains pilihan ganda tingkat sekolah dasar asli, yang dikumpulkan untuk mendorong penelitian dalam menjawab pertanyaan tingkat lanjut. Dataset dipartisi menjadi Kumpulan Tantangan dan Kumpulan Mudah, di mana yang pertama hanya berisi pertanyaan yang dijawab salah oleh algoritme berbasis pengambilan dan algoritme kejadian bersama kata. Set ini terdiri dari pertanyaan "sulit". Versi ini menyertakan paragraf yang diambil melalui sistem pencarian informasi sebagai bukti tambahan.
Ukuran unduhan :
3.53 MiB
Ukuran dataset :
3.62 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'test' | 1.172 |
'train' | 1.119 |
'validation' | 299 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@article{clark2018think,
title={Think you have solved question answering? try arc, the ai2 reasoning challenge},
author={Clark, Peter and Cowhey, Isaac and Etzioni, Oren and Khot, Tushar and Sabharwal, Ashish and Schoenick, Carissa and Tafjord, Oyvind},
journal={arXiv preprint arXiv:1803.05457},
year={2018}
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/arc_hard_with_ir_dev
Deskripsi konfigurasi : Kumpulan data ini terdiri dari pertanyaan sains pilihan ganda tingkat sekolah dasar asli, yang dikumpulkan untuk mendorong penelitian dalam menjawab pertanyaan tingkat lanjut. Dataset dipartisi menjadi Kumpulan Tantangan dan Kumpulan Mudah, di mana yang pertama hanya berisi pertanyaan yang dijawab salah oleh algoritme berbasis pengambilan dan algoritme kejadian bersama kata. Set ini terdiri dari pertanyaan "sulit". Versi ini menyertakan paragraf yang diambil melalui sistem pencarian informasi sebagai bukti tambahan.
Ukuran unduhan :
3.53 MiB
Ukuran dataset :
3.62 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'test' | 1.172 |
'train' | 1.119 |
'validation' | 299 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@article{clark2018think,
title={Think you have solved question answering? try arc, the ai2 reasoning challenge},
author={Clark, Peter and Cowhey, Isaac and Etzioni, Oren and Khot, Tushar and Sabharwal, Ashish and Schoenick, Carissa and Tafjord, Oyvind},
journal={arXiv preprint arXiv:1803.05457},
year={2018}
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/boolq
Deskripsi konfigurasi : BoolQ adalah kumpulan data penjawab pertanyaan untuk pertanyaan ya/tidak. Pertanyaan-pertanyaan ini muncul secara alami ---dihasilkan dalam pengaturan yang tidak diminta dan tidak dibatasi. Setiap contoh adalah triplet dari (pertanyaan, bagian, jawaban), dengan judul halaman sebagai konteks tambahan opsional. Penyiapan klasifikasi pasangan teks mirip dengan tugas inferensi bahasa alami yang ada.
Ukuran unduhan :
7.77 MiB
Ukuran dataset :
8.20 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 9.427 |
'validation' | 3.270 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{clark-etal-2019-boolq,
title = "{B}ool{Q}: Exploring the Surprising Difficulty of Natural Yes/No Questions",
author = "Clark, Christopher and
Lee, Kenton and
Chang, Ming-Wei and
Kwiatkowski, Tom and
Collins, Michael and
Toutanova, Kristina",
booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
month = jun,
year = "2019",
address = "Minneapolis, Minnesota",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/N19-1300",
doi = "10.18653/v1/N19-1300",
pages = "2924--2936",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/boolq_np
Deskripsi konfigurasi : BoolQ adalah kumpulan data penjawab pertanyaan untuk pertanyaan ya/tidak. Pertanyaan-pertanyaan ini muncul secara alami ---dihasilkan dalam pengaturan yang tidak diminta dan tidak dibatasi. Setiap contoh adalah triplet dari (pertanyaan, bagian, jawaban), dengan judul halaman sebagai konteks tambahan opsional. Penyiapan klasifikasi pasangan teks mirip dengan tugas inferensi bahasa alami yang ada. Versi ini menambahkan gangguan alami ke versi aslinya.
Ukuran unduhan :
10.80 MiB
Ukuran dataset :
11.40 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 9.727 |
'validation' | 7.596 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{khashabi-etal-2020-bang,
title = "More Bang for Your Buck: Natural Perturbation for Robust Question Answering",
author = "Khashabi, Daniel and
Khot, Tushar and
Sabharwal, Ashish",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.emnlp-main.12",
doi = "10.18653/v1/2020.emnlp-main.12",
pages = "163--170",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/commonsenseqa
Deskripsi konfigurasi : CommonsenseQA adalah kumpulan data jawaban pertanyaan pilihan ganda baru yang memerlukan berbagai jenis pengetahuan akal sehat untuk memprediksi jawaban yang benar . Ini berisi pertanyaan dengan satu jawaban yang benar dan empat jawaban distraktor.
Ukuran unduhan :
1.79 MiB
Ukuran dataset :
2.19 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'test' | 1.140 |
'train' | 9.741 |
'validation' | 1.221 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{talmor-etal-2019-commonsenseqa,
title = "{C}ommonsense{QA}: A Question Answering Challenge Targeting Commonsense Knowledge",
author = "Talmor, Alon and
Herzig, Jonathan and
Lourie, Nicholas and
Berant, Jonathan",
booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
month = jun,
year = "2019",
address = "Minneapolis, Minnesota",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/N19-1421",
doi = "10.18653/v1/N19-1421",
pages = "4149--4158",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/commonsenseqa_test
Deskripsi konfigurasi : CommonsenseQA adalah kumpulan data jawaban pertanyaan pilihan ganda baru yang memerlukan berbagai jenis pengetahuan akal sehat untuk memprediksi jawaban yang benar . Ini berisi pertanyaan dengan satu jawaban yang benar dan empat jawaban distraktor.
Ukuran unduhan :
1.79 MiB
Ukuran dataset :
2.19 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'test' | 1.140 |
'train' | 9.741 |
'validation' | 1.221 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{talmor-etal-2019-commonsenseqa,
title = "{C}ommonsense{QA}: A Question Answering Challenge Targeting Commonsense Knowledge",
author = "Talmor, Alon and
Herzig, Jonathan and
Lourie, Nicholas and
Berant, Jonathan",
booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
month = jun,
year = "2019",
address = "Minneapolis, Minnesota",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/N19-1421",
doi = "10.18653/v1/N19-1421",
pages = "4149--4158",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/contrast_sets_boolq
Deskripsi konfigurasi : BoolQ adalah kumpulan data penjawab pertanyaan untuk pertanyaan ya/tidak. Pertanyaan-pertanyaan ini muncul secara alami ---dihasilkan dalam pengaturan yang tidak diminta dan tidak dibatasi. Setiap contoh adalah triplet dari (pertanyaan, bagian, jawaban), dengan judul halaman sebagai konteks tambahan opsional. Penyiapan klasifikasi pasangan teks mirip dengan tugas inferensi bahasa alami yang ada. Versi ini menggunakan set kontras. Set evaluasi ini adalah gangguan yang dibuat oleh pakar yang menyimpang dari pola yang umum dalam kumpulan data asli.
Ukuran unduhan :
438.51 KiB
Ukuran dataset :
462.35 KiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 340 |
'validation' | 340 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{clark-etal-2019-boolq,
title = "{B}ool{Q}: Exploring the Surprising Difficulty of Natural Yes/No Questions",
author = "Clark, Christopher and
Lee, Kenton and
Chang, Ming-Wei and
Kwiatkowski, Tom and
Collins, Michael and
Toutanova, Kristina",
booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
month = jun,
year = "2019",
address = "Minneapolis, Minnesota",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/N19-1300",
doi = "10.18653/v1/N19-1300",
pages = "2924--2936",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/contrast_sets_drop
Deskripsi konfigurasi : DROP adalah tolok ukur QA yang dibuat secara crowdsourced, di mana sistem harus menyelesaikan referensi dalam sebuah pertanyaan, mungkin ke beberapa posisi input, dan melakukan operasi diskrit terhadapnya (seperti penambahan, penghitungan, atau penyortiran). Operasi ini membutuhkan pemahaman yang jauh lebih komprehensif tentang isi paragraf daripada yang diperlukan untuk kumpulan data sebelumnya. Versi ini menggunakan set kontras. Set evaluasi ini adalah gangguan yang dibuat oleh pakar yang menyimpang dari pola yang umum dalam kumpulan data asli.
Ukuran unduhan :
2.20 MiB
Ukuran dataset :
2.26 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 947 |
'validation' | 947 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{dua-etal-2019-drop,
title = "{DROP}: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs",
author = "Dua, Dheeru and
Wang, Yizhong and
Dasigi, Pradeep and
Stanovsky, Gabriel and
Singh, Sameer and
Gardner, Matt",
booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
month = jun,
year = "2019",
address = "Minneapolis, Minnesota",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/N19-1246",
doi = "10.18653/v1/N19-1246",
pages = "2368--2378",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/contrast_sets_quoref
Deskripsi konfigurasi : Kumpulan data ini menguji kemampuan penalaran inti dari sistem pemahaman bacaan. Dalam tolok ukur pemilihan rentang ini yang berisi pertanyaan tentang paragraf dari Wikipedia, sistem harus menyelesaikan referensi keras sebelum memilih rentang yang sesuai dalam paragraf untuk menjawab pertanyaan. Versi ini menggunakan set kontras. Set evaluasi ini adalah gangguan yang dibuat oleh pakar yang menyimpang dari pola yang umum dalam kumpulan data asli.
Ukuran unduhan :
2.60 MiB
Ukuran dataset :
2.65 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 700 |
'validation' | 700 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{dasigi-etal-2019-quoref,
title = "{Q}uoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning",
author = "Dasigi, Pradeep and
Liu, Nelson F. and
Marasovi{'c}, Ana and
Smith, Noah A. and
Gardner, Matt",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
month = nov,
year = "2019",
address = "Hong Kong, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/D19-1606",
doi = "10.18653/v1/D19-1606",
pages = "5925--5932",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/contrast_sets_ropes
Deskripsi konfigurasi : Kumpulan data ini menguji kemampuan sistem untuk menerapkan pengetahuan dari bagian teks ke situasi baru. Suatu sistem disajikan dengan bagian latar belakang yang berisi hubungan kausal atau kualitatif (misalnya, "penyerbuk hewan meningkatkan efisiensi pembuahan pada bunga"), situasi baru yang menggunakan latar belakang ini, dan pertanyaan yang memerlukan penalaran tentang efek hubungan dalam bagian latar belakang dalam konteks situasi. Versi ini menggunakan set kontras. Set evaluasi ini adalah gangguan yang dibuat oleh pakar yang menyimpang dari pola yang umum dalam kumpulan data asli.
Ukuran unduhan :
1.97 MiB
Ukuran dataset :
2.04 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 974 |
'validation' | 974 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{lin-etal-2019-reasoning,
title = "Reasoning Over Paragraph Effects in Situations",
author = "Lin, Kevin and
Tafjord, Oyvind and
Clark, Peter and
Gardner, Matt",
booktitle = "Proceedings of the 2nd Workshop on Machine Reading for Question Answering",
month = nov,
year = "2019",
address = "Hong Kong, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/D19-5808",
doi = "10.18653/v1/D19-5808",
pages = "58--62",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/drop
Deskripsi konfigurasi : DROP adalah tolok ukur QA yang dibuat secara crowdsourced, di mana sistem harus menyelesaikan referensi dalam sebuah pertanyaan, mungkin ke beberapa posisi input, dan melakukan operasi diskrit terhadapnya (seperti penambahan, penghitungan, atau penyortiran). Operasi ini membutuhkan pemahaman yang jauh lebih komprehensif tentang isi paragraf daripada yang diperlukan untuk kumpulan data sebelumnya.
Ukuran unduhan :
105.18 MiB
Ukuran dataset :
108.16 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 77.399 |
'validation' | 9.536 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{dua-etal-2019-drop,
title = "{DROP}: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs",
author = "Dua, Dheeru and
Wang, Yizhong and
Dasigi, Pradeep and
Stanovsky, Gabriel and
Singh, Sameer and
Gardner, Matt",
booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
month = jun,
year = "2019",
address = "Minneapolis, Minnesota",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/N19-1246",
doi = "10.18653/v1/N19-1246",
pages = "2368--2378",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/mctest
Deskripsi konfigurasi : MCTest memerlukan mesin untuk menjawab pertanyaan pemahaman bacaan pilihan ganda tentang cerita fiksi, yang secara langsung menangani tujuan tingkat tinggi dari pemahaman mesin domain terbuka. Pemahaman membaca dapat menguji kemampuan tingkat lanjut seperti penalaran kausal dan memahami dunia, namun dengan menjadi pilihan ganda, tetap memberikan metrik yang jelas. Dengan menjadi fiksi, jawabannya biasanya hanya dapat ditemukan dalam cerita itu sendiri. Cerita dan pertanyaan juga dibatasi dengan hati-hati untuk yang dapat dipahami oleh anak kecil, sehingga mengurangi pengetahuan dunia yang diperlukan untuk tugas tersebut.
Ukuran unduhan :
2.14 MiB
Ukuran dataset :
2.20 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 1.480 |
'validation' | 320 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{richardson-etal-2013-mctest,
title = "{MCT}est: A Challenge Dataset for the Open-Domain Machine Comprehension of Text",
author = "Richardson, Matthew and
Burges, Christopher J.C. and
Renshaw, Erin",
booktitle = "Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing",
month = oct,
year = "2013",
address = "Seattle, Washington, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/D13-1020",
pages = "193--203",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/mctest_corrected_the_separator
Deskripsi konfigurasi : MCTest memerlukan mesin untuk menjawab pertanyaan pemahaman bacaan pilihan ganda tentang cerita fiksi, yang secara langsung menangani tujuan tingkat tinggi dari pemahaman mesin domain terbuka. Pemahaman membaca dapat menguji kemampuan tingkat lanjut seperti penalaran kausal dan memahami dunia, namun dengan menjadi pilihan ganda, tetap memberikan metrik yang jelas. Dengan menjadi fiksi, jawabannya biasanya hanya dapat ditemukan dalam cerita itu sendiri. Cerita dan pertanyaan juga dibatasi dengan hati-hati untuk yang dapat dipahami oleh anak kecil, sehingga mengurangi pengetahuan dunia yang diperlukan untuk tugas tersebut.
Ukuran unduhan :
2.15 MiB
Ukuran dataset :
2.21 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 1.480 |
'validation' | 320 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{richardson-etal-2013-mctest,
title = "{MCT}est: A Challenge Dataset for the Open-Domain Machine Comprehension of Text",
author = "Richardson, Matthew and
Burges, Christopher J.C. and
Renshaw, Erin",
booktitle = "Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing",
month = oct,
year = "2013",
address = "Seattle, Washington, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/D13-1020",
pages = "193--203",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/multirc
Deskripsi konfigurasi : MultiRC adalah tantangan pemahaman bacaan di mana pertanyaan hanya dapat dijawab dengan mempertimbangkan informasi akun dari beberapa kalimat. Pertanyaan dan jawaban untuk tantangan ini diajukan dan diverifikasi melalui eksperimen crowdsourcing 4 langkah. Kumpulan data berisi pertanyaan untuk paragraf di 7 domain yang berbeda (ilmu sekolah dasar, berita, panduan perjalanan, cerita fiksi, dll) membawa keragaman linguistik ke dalam teks dan kata-kata pertanyaan.
Ukuran unduhan :
897.09 KiB
Ukuran dataset :
918.42 KiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 312 |
'validation' | 312 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{khashabi-etal-2018-looking,
title = "Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences",
author = "Khashabi, Daniel and
Chaturvedi, Snigdha and
Roth, Michael and
Upadhyay, Shyam and
Roth, Dan",
booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)",
month = jun,
year = "2018",
address = "New Orleans, Louisiana",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/N18-1023",
doi = "10.18653/v1/N18-1023",
pages = "252--262",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/narrativeqa
Deskripsi konfigurasi : NarrativeQA adalah kumpulan data cerita berbahasa Inggris dan pertanyaan terkait yang dirancang untuk menguji pemahaman bacaan, terutama pada dokumen panjang.
Ukuran unduhan :
308.28 MiB
Ukuran dataset :
311.22 MiB
Di-cache otomatis ( dokumentasi ): Tidak
Perpecahan :
Membelah | Contoh |
---|---|
'test' | 21.114 |
'train' | 65.494 |
'validation' | 6.922 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@article{kocisky-etal-2018-narrativeqa,
title = "The {N}arrative{QA} Reading Comprehension Challenge",
author = "Ko{
{c} }isk{'y}, Tom{'a}{
{s} } and
Schwarz, Jonathan and
Blunsom, Phil and
Dyer, Chris and
Hermann, Karl Moritz and
Melis, G{'a}bor and
Grefenstette, Edward",
journal = "Transactions of the Association for Computational Linguistics",
volume = "6",
year = "2018",
address = "Cambridge, MA",
publisher = "MIT Press",
url = "https://aclanthology.org/Q18-1023",
doi = "10.1162/tacl_a_00023",
pages = "317--328",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/narrativeqa_dev
Deskripsi konfigurasi : NarrativeQA adalah kumpulan data cerita berbahasa Inggris dan pertanyaan terkait yang dirancang untuk menguji pemahaman bacaan, terutama pada dokumen panjang.
Ukuran unduhan :
308.28 MiB
Ukuran dataset :
311.22 MiB
Di-cache otomatis ( dokumentasi ): Tidak
Perpecahan :
Membelah | Contoh |
---|---|
'test' | 21.114 |
'train' | 65.494 |
'validation' | 6.922 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@article{kocisky-etal-2018-narrativeqa,
title = "The {N}arrative{QA} Reading Comprehension Challenge",
author = "Ko{
{c} }isk{'y}, Tom{'a}{
{s} } and
Schwarz, Jonathan and
Blunsom, Phil and
Dyer, Chris and
Hermann, Karl Moritz and
Melis, G{'a}bor and
Grefenstette, Edward",
journal = "Transactions of the Association for Computational Linguistics",
volume = "6",
year = "2018",
address = "Cambridge, MA",
publisher = "MIT Press",
url = "https://aclanthology.org/Q18-1023",
doi = "10.1162/tacl_a_00023",
pages = "317--328",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/natural_questions
Deskripsi konfigurasi : Korpus NQ berisi pertanyaan dari pengguna sebenarnya, dan memerlukan sistem QA untuk membaca dan memahami seluruh artikel Wikipedia yang mungkin berisi atau tidak berisi jawaban atas pertanyaan tersebut. Dimasukkannya pertanyaan pengguna nyata, dan persyaratan bahwa solusi harus membaca seluruh halaman untuk menemukan jawabannya, menyebabkan NQ menjadi tugas yang lebih realistis dan menantang daripada kumpulan data QA sebelumnya.
Ukuran unduhan :
6.95 MiB
Ukuran dataset :
9.88 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 96.075 |
'validation' | 2.295 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@article{kwiatkowski-etal-2019-natural,
title = "Natural Questions: A Benchmark for Question Answering Research",
author = "Kwiatkowski, Tom and
Palomaki, Jennimaria and
Redfield, Olivia and
Collins, Michael and
Parikh, Ankur and
Alberti, Chris and
Epstein, Danielle and
Polosukhin, Illia and
Devlin, Jacob and
Lee, Kenton and
Toutanova, Kristina and
Jones, Llion and
Kelcey, Matthew and
Chang, Ming-Wei and
Dai, Andrew M. and
Uszkoreit, Jakob and
Le, Quoc and
Petrov, Slav",
journal = "Transactions of the Association for Computational Linguistics",
volume = "7",
year = "2019",
address = "Cambridge, MA",
publisher = "MIT Press",
url = "https://aclanthology.org/Q19-1026",
doi = "10.1162/tacl_a_00276",
pages = "452--466",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/natural_questions_direct_ans
Deskripsi konfigurasi : Korpus NQ berisi pertanyaan dari pengguna sebenarnya, dan memerlukan sistem QA untuk membaca dan memahami seluruh artikel Wikipedia yang mungkin berisi atau tidak berisi jawaban atas pertanyaan tersebut. Dimasukkannya pertanyaan pengguna nyata, dan persyaratan bahwa solusi harus membaca seluruh halaman untuk menemukan jawabannya, menyebabkan NQ menjadi tugas yang lebih realistis dan menantang daripada kumpulan data QA sebelumnya. Versi ini terdiri dari pertanyaan jawaban langsung.
Ukuran unduhan :
6.82 MiB
Ukuran dataset :
10.19 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'test' | 6.468 |
'train' | 96.676 |
'validation' | 10.693 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@article{kwiatkowski-etal-2019-natural,
title = "Natural Questions: A Benchmark for Question Answering Research",
author = "Kwiatkowski, Tom and
Palomaki, Jennimaria and
Redfield, Olivia and
Collins, Michael and
Parikh, Ankur and
Alberti, Chris and
Epstein, Danielle and
Polosukhin, Illia and
Devlin, Jacob and
Lee, Kenton and
Toutanova, Kristina and
Jones, Llion and
Kelcey, Matthew and
Chang, Ming-Wei and
Dai, Andrew M. and
Uszkoreit, Jakob and
Le, Quoc and
Petrov, Slav",
journal = "Transactions of the Association for Computational Linguistics",
volume = "7",
year = "2019",
address = "Cambridge, MA",
publisher = "MIT Press",
url = "https://aclanthology.org/Q19-1026",
doi = "10.1162/tacl_a_00276",
pages = "452--466",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/natural_questions_direct_ans_test
Deskripsi konfigurasi : Korpus NQ berisi pertanyaan dari pengguna sebenarnya, dan memerlukan sistem QA untuk membaca dan memahami seluruh artikel Wikipedia yang mungkin berisi atau tidak berisi jawaban atas pertanyaan tersebut. Dimasukkannya pertanyaan pengguna nyata, dan persyaratan bahwa solusi harus membaca seluruh halaman untuk menemukan jawabannya, menyebabkan NQ menjadi tugas yang lebih realistis dan menantang daripada kumpulan data QA sebelumnya. Versi ini terdiri dari pertanyaan jawaban langsung.
Ukuran unduhan :
6.82 MiB
Ukuran dataset :
10.19 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'test' | 6.468 |
'train' | 96.676 |
'validation' | 10.693 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@article{kwiatkowski-etal-2019-natural,
title = "Natural Questions: A Benchmark for Question Answering Research",
author = "Kwiatkowski, Tom and
Palomaki, Jennimaria and
Redfield, Olivia and
Collins, Michael and
Parikh, Ankur and
Alberti, Chris and
Epstein, Danielle and
Polosukhin, Illia and
Devlin, Jacob and
Lee, Kenton and
Toutanova, Kristina and
Jones, Llion and
Kelcey, Matthew and
Chang, Ming-Wei and
Dai, Andrew M. and
Uszkoreit, Jakob and
Le, Quoc and
Petrov, Slav",
journal = "Transactions of the Association for Computational Linguistics",
volume = "7",
year = "2019",
address = "Cambridge, MA",
publisher = "MIT Press",
url = "https://aclanthology.org/Q19-1026",
doi = "10.1162/tacl_a_00276",
pages = "452--466",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/natural_questions_with_dpr_para
Deskripsi konfigurasi : Korpus NQ berisi pertanyaan dari pengguna sebenarnya, dan memerlukan sistem QA untuk membaca dan memahami seluruh artikel Wikipedia yang mungkin berisi atau tidak berisi jawaban atas pertanyaan tersebut. Dimasukkannya pertanyaan pengguna nyata, dan persyaratan bahwa solusi harus membaca seluruh halaman untuk menemukan jawabannya, menyebabkan NQ menjadi tugas yang lebih realistis dan menantang daripada kumpulan data QA sebelumnya. Versi ini menyertakan paragraf tambahan (diperoleh dengan menggunakan mesin pencarian DPR) untuk menambah setiap pertanyaan.
Ukuran unduhan :
319.22 MiB
Ukuran dataset :
322.91 MiB
Di-cache otomatis ( dokumentasi ): Tidak
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 96.676 |
'validation' | 10.693 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@article{kwiatkowski-etal-2019-natural,
title = "Natural Questions: A Benchmark for Question Answering Research",
author = "Kwiatkowski, Tom and
Palomaki, Jennimaria and
Redfield, Olivia and
Collins, Michael and
Parikh, Ankur and
Alberti, Chris and
Epstein, Danielle and
Polosukhin, Illia and
Devlin, Jacob and
Lee, Kenton and
Toutanova, Kristina and
Jones, Llion and
Kelcey, Matthew and
Chang, Ming-Wei and
Dai, Andrew M. and
Uszkoreit, Jakob and
Le, Quoc and
Petrov, Slav",
journal = "Transactions of the Association for Computational Linguistics",
volume = "7",
year = "2019",
address = "Cambridge, MA",
publisher = "MIT Press",
url = "https://aclanthology.org/Q19-1026",
doi = "10.1162/tacl_a_00276",
pages = "452--466",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/natural_questions_with_dpr_para_test
Deskripsi konfigurasi : Korpus NQ berisi pertanyaan dari pengguna sebenarnya, dan memerlukan sistem QA untuk membaca dan memahami seluruh artikel Wikipedia yang mungkin berisi atau tidak berisi jawaban atas pertanyaan tersebut. Dimasukkannya pertanyaan pengguna nyata, dan persyaratan bahwa solusi harus membaca seluruh halaman untuk menemukan jawabannya, menyebabkan NQ menjadi tugas yang lebih realistis dan menantang daripada kumpulan data QA sebelumnya. Versi ini menyertakan paragraf tambahan (diperoleh dengan menggunakan mesin pencarian DPR) untuk menambah setiap pertanyaan.
Ukuran unduhan :
306.94 MiB
Ukuran dataset :
310.48 MiB
Di-cache otomatis ( dokumentasi ): Tidak
Perpecahan :
Membelah | Contoh |
---|---|
'test' | 6.468 |
'train' | 96.676 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@article{kwiatkowski-etal-2019-natural,
title = "Natural Questions: A Benchmark for Question Answering Research",
author = "Kwiatkowski, Tom and
Palomaki, Jennimaria and
Redfield, Olivia and
Collins, Michael and
Parikh, Ankur and
Alberti, Chris and
Epstein, Danielle and
Polosukhin, Illia and
Devlin, Jacob and
Lee, Kenton and
Toutanova, Kristina and
Jones, Llion and
Kelcey, Matthew and
Chang, Ming-Wei and
Dai, Andrew M. and
Uszkoreit, Jakob and
Le, Quoc and
Petrov, Slav",
journal = "Transactions of the Association for Computational Linguistics",
volume = "7",
year = "2019",
address = "Cambridge, MA",
publisher = "MIT Press",
url = "https://aclanthology.org/Q19-1026",
doi = "10.1162/tacl_a_00276",
pages = "452--466",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/newsqa
Deskripsi konfigurasi : NewsQA adalah kumpulan data pemahaman mesin yang menantang dari pasangan pertanyaan-jawaban yang dihasilkan manusia. Crowdworker memberikan pertanyaan dan jawaban berdasarkan serangkaian artikel berita dari CNN, dengan jawaban yang terdiri dari rangkaian teks dari artikel terkait.
Ukuran unduhan :
283.33 MiB
Ukuran dataset :
285.94 MiB
Di-cache otomatis ( dokumentasi ): Tidak
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 75.882 |
'validation' | 4.309 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{trischler-etal-2017-newsqa,
title = "{N}ews{QA}: A Machine Comprehension Dataset",
author = "Trischler, Adam and
Wang, Tong and
Yuan, Xingdi and
Harris, Justin and
Sordoni, Alessandro and
Bachman, Philip and
Suleman, Kaheer",
booktitle = "Proceedings of the 2nd Workshop on Representation Learning for {NLP}",
month = aug,
year = "2017",
address = "Vancouver, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/W17-2623",
doi = "10.18653/v1/W17-2623",
pages = "191--200",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/openbookqa
Deskripsi Config : OpenBookQA bertujuan untuk mempromosikan penelitian dalam menjawab pertanyaan tingkat lanjut, menyelidiki pemahaman yang lebih dalam tentang topik (dengan fakta-fakta penting yang dirangkum sebagai buku terbuka, juga dilengkapi dengan kumpulan data) dan bahasa yang digunakan untuk menyatakannya. Secara khusus, ini berisi pertanyaan yang membutuhkan penalaran multi-langkah, penggunaan pengetahuan umum dan akal sehat tambahan, dan pemahaman teks yang kaya. OpenBookQA adalah jenis baru kumpulan data penjawab pertanyaan yang dimodelkan setelah ujian buku terbuka untuk menilai pemahaman manusia tentang suatu subjek.
Ukuran unduhan :
942.34 KiB
Ukuran dataset :
1.11 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'test' | 500 |
'train' | 4.957 |
'validation' | 500 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{mihaylov-etal-2018-suit,
title = "Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering",
author = "Mihaylov, Todor and
Clark, Peter and
Khot, Tushar and
Sabharwal, Ashish",
booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
month = oct # "-" # nov,
year = "2018",
address = "Brussels, Belgium",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/D18-1260",
doi = "10.18653/v1/D18-1260",
pages = "2381--2391",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/openbookqa_dev
Deskripsi Config : OpenBookQA bertujuan untuk mempromosikan penelitian dalam menjawab pertanyaan tingkat lanjut, menyelidiki pemahaman yang lebih dalam tentang topik (dengan fakta-fakta penting yang dirangkum sebagai buku terbuka, juga dilengkapi dengan kumpulan data) dan bahasa yang digunakan untuk menyatakannya. Secara khusus, ini berisi pertanyaan yang membutuhkan penalaran multi-langkah, penggunaan pengetahuan umum dan akal sehat tambahan, dan pemahaman teks yang kaya. OpenBookQA adalah jenis baru kumpulan data penjawab pertanyaan yang dimodelkan setelah ujian buku terbuka untuk menilai pemahaman manusia tentang suatu subjek.
Ukuran unduhan :
942.34 KiB
Ukuran dataset :
1.11 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'test' | 500 |
'train' | 4.957 |
'validation' | 500 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{mihaylov-etal-2018-suit,
title = "Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering",
author = "Mihaylov, Todor and
Clark, Peter and
Khot, Tushar and
Sabharwal, Ashish",
booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
month = oct # "-" # nov,
year = "2018",
address = "Brussels, Belgium",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/D18-1260",
doi = "10.18653/v1/D18-1260",
pages = "2381--2391",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/openbookqa_with_ir
Deskripsi Config : OpenBookQA bertujuan untuk mempromosikan penelitian dalam menjawab pertanyaan tingkat lanjut, menyelidiki pemahaman yang lebih dalam tentang topik (dengan fakta-fakta penting yang dirangkum sebagai buku terbuka, juga dilengkapi dengan kumpulan data) dan bahasa yang digunakan untuk menyatakannya. Secara khusus, ini berisi pertanyaan yang membutuhkan penalaran multi-langkah, penggunaan pengetahuan umum dan akal sehat tambahan, dan pemahaman teks yang kaya. OpenBookQA adalah jenis baru kumpulan data penjawab pertanyaan yang dimodelkan setelah ujian buku terbuka untuk menilai pemahaman manusia tentang suatu subjek. Versi ini menyertakan paragraf yang diambil melalui sistem pencarian informasi sebagai bukti tambahan.
Ukuran unduhan :
6.08 MiB
Ukuran dataset :
6.28 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'test' | 500 |
'train' | 4.957 |
'validation' | 500 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{mihaylov-etal-2018-suit,
title = "Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering",
author = "Mihaylov, Todor and
Clark, Peter and
Khot, Tushar and
Sabharwal, Ashish",
booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
month = oct # "-" # nov,
year = "2018",
address = "Brussels, Belgium",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/D18-1260",
doi = "10.18653/v1/D18-1260",
pages = "2381--2391",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/openbookqa_with_ir_dev
Deskripsi Config : OpenBookQA bertujuan untuk mempromosikan penelitian dalam menjawab pertanyaan tingkat lanjut, menyelidiki pemahaman yang lebih dalam tentang topik (dengan fakta-fakta penting yang dirangkum sebagai buku terbuka, juga dilengkapi dengan kumpulan data) dan bahasa yang digunakan untuk menyatakannya. Secara khusus, ini berisi pertanyaan yang membutuhkan penalaran multi-langkah, penggunaan pengetahuan umum dan akal sehat tambahan, dan pemahaman teks yang kaya. OpenBookQA adalah jenis baru kumpulan data penjawab pertanyaan yang dimodelkan setelah ujian buku terbuka untuk menilai pemahaman manusia tentang suatu subjek. Versi ini menyertakan paragraf yang diambil melalui sistem pencarian informasi sebagai bukti tambahan.
Ukuran unduhan :
6.08 MiB
Ukuran dataset :
6.28 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'test' | 500 |
'train' | 4.957 |
'validation' | 500 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{mihaylov-etal-2018-suit,
title = "Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering",
author = "Mihaylov, Todor and
Clark, Peter and
Khot, Tushar and
Sabharwal, Ashish",
booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
month = oct # "-" # nov,
year = "2018",
address = "Brussels, Belgium",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/D18-1260",
doi = "10.18653/v1/D18-1260",
pages = "2381--2391",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/physical_iqa
Deskripsi konfigurasi : Ini adalah kumpulan data untuk kemajuan pembandingan dalam pemahaman akal sehat fisik. Tugas yang mendasarinya adalah menjawab pertanyaan pilihan ganda: diberi pertanyaan q dan dua kemungkinan solusi s1, s2, model atau manusia harus memilih solusi yang paling tepat, yang salah satunya benar. Kumpulan data berfokus pada situasi sehari-hari dengan preferensi untuk solusi atipikal. Kumpulan data ini terinspirasi oleh instructables.com, yang memberikan petunjuk kepada pengguna tentang cara membuat, membuat, memanggang, atau memanipulasi objek menggunakan bahan sehari-hari. Anotator diminta untuk memberikan gangguan semantik atau pendekatan alternatif yang secara sintaksis dan topik serupa untuk memastikan pengetahuan fisik ditargetkan. Dataset selanjutnya dibersihkan dari artefak dasar menggunakan algoritma AFLite.
Ukuran unduhan :
6.01 MiB
Ukuran dataset :
6.59 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 16.113 |
'validation' | 1.838 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{bisk2020piqa,
title={Piqa: Reasoning about physical commonsense in natural language},
author={Bisk, Yonatan and Zellers, Rowan and Gao, Jianfeng and Choi, Yejin and others},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={34},
number={05},
pages={7432--7439},
year={2020}
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/qasc
Deskripsi konfigurasi : QASC adalah kumpulan data penjawab pertanyaan dengan fokus pada komposisi kalimat. Ini terdiri dari pertanyaan pilihan ganda 8 arah tentang sains sekolah dasar, dan dilengkapi dengan kumpulan 17 juta kalimat.
Ukuran unduhan :
1.75 MiB
Ukuran dataset :
2.09 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'test' | 920 |
'train' | 8.134 |
'validation' | 926 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{khot2020qasc,
title={Qasc: A dataset for question answering via sentence composition},
author={Khot, Tushar and Clark, Peter and Guerquin, Michal and Jansen, Peter and Sabharwal, Ashish},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={34},
number={05},
pages={8082--8090},
year={2020}
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/qasc_test
Deskripsi konfigurasi : QASC adalah kumpulan data penjawab pertanyaan dengan fokus pada komposisi kalimat. Ini terdiri dari pertanyaan pilihan ganda 8 arah tentang sains sekolah dasar, dan dilengkapi dengan kumpulan 17 juta kalimat.
Ukuran unduhan :
1.75 MiB
Ukuran dataset :
2.09 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'test' | 920 |
'train' | 8.134 |
'validation' | 926 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{khot2020qasc,
title={Qasc: A dataset for question answering via sentence composition},
author={Khot, Tushar and Clark, Peter and Guerquin, Michal and Jansen, Peter and Sabharwal, Ashish},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={34},
number={05},
pages={8082--8090},
year={2020}
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/qasc_with_ir
Deskripsi konfigurasi : QASC adalah kumpulan data penjawab pertanyaan dengan fokus pada komposisi kalimat. Ini terdiri dari pertanyaan pilihan ganda 8 arah tentang sains sekolah dasar, dan dilengkapi dengan kumpulan 17 juta kalimat. Versi ini menyertakan paragraf yang diambil melalui sistem pencarian informasi sebagai bukti tambahan.
Ukuran unduhan :
16.95 MiB
Ukuran dataset :
17.30 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'test' | 920 |
'train' | 8.134 |
'validation' | 926 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{khot2020qasc,
title={Qasc: A dataset for question answering via sentence composition},
author={Khot, Tushar and Clark, Peter and Guerquin, Michal and Jansen, Peter and Sabharwal, Ashish},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={34},
number={05},
pages={8082--8090},
year={2020}
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/qasc_with_ir_test
Deskripsi konfigurasi : QASC adalah kumpulan data penjawab pertanyaan dengan fokus pada komposisi kalimat. Ini terdiri dari pertanyaan pilihan ganda 8 arah tentang sains sekolah dasar, dan dilengkapi dengan kumpulan 17 juta kalimat. Versi ini menyertakan paragraf yang diambil melalui sistem pencarian informasi sebagai bukti tambahan.
Ukuran unduhan :
16.95 MiB
Ukuran dataset :
17.30 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'test' | 920 |
'train' | 8.134 |
'validation' | 926 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{khot2020qasc,
title={Qasc: A dataset for question answering via sentence composition},
author={Khot, Tushar and Clark, Peter and Guerquin, Michal and Jansen, Peter and Sabharwal, Ashish},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={34},
number={05},
pages={8082--8090},
year={2020}
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/quoref
Deskripsi konfigurasi : Kumpulan data ini menguji kemampuan penalaran inti dari sistem pemahaman bacaan. Dalam tolok ukur pemilihan rentang ini yang berisi pertanyaan tentang paragraf dari Wikipedia, sistem harus menyelesaikan referensi keras sebelum memilih rentang yang sesuai dalam paragraf untuk menjawab pertanyaan.
Ukuran unduhan :
51.43 MiB
Ukuran dataset :
52.29 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 22.265 |
'validation' | 2.768 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{dasigi-etal-2019-quoref,
title = "{Q}uoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning",
author = "Dasigi, Pradeep and
Liu, Nelson F. and
Marasovi{'c}, Ana and
Smith, Noah A. and
Gardner, Matt",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
month = nov,
year = "2019",
address = "Hong Kong, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/D19-1606",
doi = "10.18653/v1/D19-1606",
pages = "5925--5932",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/race_string
Deskripsi konfigurasi : Ras adalah kumpulan data pemahaman bacaan berskala besar. Kumpulan data dikumpulkan dari ujian bahasa Inggris di China, yang dirancang untuk siswa sekolah menengah dan sekolah menengah atas. Dataset dapat berfungsi sebagai set pelatihan dan pengujian untuk pemahaman mesin.
Ukuran unduhan :
167.97 MiB
Ukuran dataset :
171.23 MiB
Auto-cached ( dokumentasi ): Ya (test, validasi), Hanya ketika
shuffle_files=False
(train)Perpecahan :
Membelah | Contoh |
---|---|
'test' | 4.934 |
'train' | 87.863 |
'validation' | 4.887 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{lai-etal-2017-race,
title = "{RACE}: Large-scale {R}e{A}ding Comprehension Dataset From Examinations",
author = "Lai, Guokun and
Xie, Qizhe and
Liu, Hanxiao and
Yang, Yiming and
Hovy, Eduard",
booktitle = "Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing",
month = sep,
year = "2017",
address = "Copenhagen, Denmark",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/D17-1082",
doi = "10.18653/v1/D17-1082",
pages = "785--794",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/race_string_dev
Deskripsi konfigurasi : Ras adalah kumpulan data pemahaman bacaan berskala besar. Kumpulan data dikumpulkan dari ujian bahasa Inggris di China, yang dirancang untuk siswa sekolah menengah dan sekolah menengah atas. Dataset dapat berfungsi sebagai set pelatihan dan pengujian untuk pemahaman mesin.
Ukuran unduhan :
167.97 MiB
Ukuran dataset :
171.23 MiB
Auto-cached ( dokumentasi ): Ya (test, validasi), Hanya ketika
shuffle_files=False
(train)Perpecahan :
Membelah | Contoh |
---|---|
'test' | 4.934 |
'train' | 87.863 |
'validation' | 4.887 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{lai-etal-2017-race,
title = "{RACE}: Large-scale {R}e{A}ding Comprehension Dataset From Examinations",
author = "Lai, Guokun and
Xie, Qizhe and
Liu, Hanxiao and
Yang, Yiming and
Hovy, Eduard",
booktitle = "Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing",
month = sep,
year = "2017",
address = "Copenhagen, Denmark",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/D17-1082",
doi = "10.18653/v1/D17-1082",
pages = "785--794",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/ropes
Deskripsi konfigurasi : Kumpulan data ini menguji kemampuan sistem untuk menerapkan pengetahuan dari bagian teks ke situasi baru. Suatu sistem disajikan dengan bagian latar belakang yang berisi hubungan kausal atau kualitatif (misalnya, "penyerbuk hewan meningkatkan efisiensi pembuahan pada bunga"), situasi baru yang menggunakan latar belakang ini, dan pertanyaan yang memerlukan penalaran tentang efek hubungan dalam bagian latar belakang dalam konteks situasi.
Ukuran unduhan :
12.91 MiB
Ukuran dataset :
13.35 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 10.924 |
'validation' | 1.688 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{lin-etal-2019-reasoning,
title = "Reasoning Over Paragraph Effects in Situations",
author = "Lin, Kevin and
Tafjord, Oyvind and
Clark, Peter and
Gardner, Matt",
booktitle = "Proceedings of the 2nd Workshop on Machine Reading for Question Answering",
month = nov,
year = "2019",
address = "Hong Kong, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/D19-5808",
doi = "10.18653/v1/D19-5808",
pages = "58--62",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/social_iqa
Deskripsi konfigurasi : Ini adalah tolok ukur skala besar untuk penalaran akal sehat tentang situasi sosial. Social IQa berisi soal pilihan ganda untuk menggali kecerdasan emosional dan sosial dalam berbagai situasi sehari-hari. Melalui crowdsourcing, pertanyaan akal sehat bersama dengan jawaban benar dan salah tentang interaksi sosial dikumpulkan, menggunakan kerangka kerja baru yang mengurangi artefak gaya dalam jawaban yang salah dengan meminta pekerja untuk memberikan jawaban yang benar untuk pertanyaan yang berbeda namun terkait.
Ukuran unduhan :
7.08 MiB
Ukuran dataset :
8.22 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 33.410 |
'validation' | 1.954 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{sap-etal-2019-social,
title = "Social {IQ}a: Commonsense Reasoning about Social Interactions",
author = "Sap, Maarten and
Rashkin, Hannah and
Chen, Derek and
Le Bras, Ronan and
Choi, Yejin",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
month = nov,
year = "2019",
address = "Hong Kong, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/D19-1454",
doi = "10.18653/v1/D19-1454",
pages = "4463--4473",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/squad1_1
Deskripsi konfigurasi : Ini adalah kumpulan data pemahaman bacaan yang terdiri dari pertanyaan yang diajukan oleh crowdworker pada serangkaian artikel Wikipedia, di mana jawaban untuk setiap pertanyaan adalah segmen teks dari bagian bacaan yang sesuai.
Ukuran unduhan :
80.62 MiB
Ukuran dataset :
83.99 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 87.514 |
'validation' | 10.570 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{rajpurkar-etal-2016-squad,
title = "{SQ}u{AD}: 100,000+ Questions for Machine Comprehension of Text",
author = "Rajpurkar, Pranav and
Zhang, Jian and
Lopyrev, Konstantin and
Liang, Percy",
booktitle = "Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2016",
address = "Austin, Texas",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/D16-1264",
doi = "10.18653/v1/D16-1264",
pages = "2383--2392",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/squad2
Deskripsi konfigurasi : Kumpulan data ini menggabungkan kumpulan data Stanford Question Answering Dataset (SQuAD) asli dengan pertanyaan yang tidak dapat dijawab yang ditulis secara berlawanan oleh crowdworker agar terlihat mirip dengan yang dapat dijawab.
Ukuran unduhan :
116.56 MiB
Ukuran dataset :
121.43 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 130.149 |
'validation' | 11.873 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{rajpurkar-etal-2018-know,
title = "Know What You Don{'}t Know: Unanswerable Questions for {SQ}u{AD}",
author = "Rajpurkar, Pranav and
Jia, Robin and
Liang, Percy",
booktitle = "Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)",
month = jul,
year = "2018",
address = "Melbourne, Australia",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/P18-2124",
doi = "10.18653/v1/P18-2124",
pages = "784--789",
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/winogrande_l
Deskripsi Config : Dataset ini terinspirasi oleh desain Winograd Schema Challenge asli, tetapi disesuaikan untuk meningkatkan skala dan kekerasan dataset. Langkah-langkah kunci dari konstruksi dataset terdiri dari (1) prosedur crowdsourcing yang dirancang dengan hati-hati, diikuti oleh (2) pengurangan bias sistematis menggunakan algoritme AfLite baru yang menggeneralisasi asosiasi kata yang dapat dideteksi manusia menjadi asosiasi penyematan yang dapat dideteksi mesin. Set pelatihan dengan ukuran berbeda disediakan. Set ini sesuai dengan ukuran
l
.Ukuran unduhan :
1.49 MiB
Ukuran dataset :
1.83 MiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 10.234 |
'validation' | 1.267 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{sakaguchi2020winogrande,
title={Winogrande: An adversarial winograd schema challenge at scale},
author={Sakaguchi, Keisuke and Le Bras, Ronan and Bhagavatula, Chandra and Choi, Yejin},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={34},
number={05},
pages={8732--8740},
year={2020}
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/winogrande_m
Deskripsi Config : Dataset ini terinspirasi oleh desain Winograd Schema Challenge asli, tetapi disesuaikan untuk meningkatkan skala dan kekerasan dataset. Langkah-langkah kunci dari konstruksi dataset terdiri dari (1) prosedur crowdsourcing yang dirancang dengan hati-hati, diikuti oleh (2) pengurangan bias sistematis menggunakan algoritme AfLite baru yang menggeneralisasi asosiasi kata yang dapat dideteksi manusia menjadi asosiasi penyematan yang dapat dideteksi mesin. Set pelatihan dengan ukuran berbeda disediakan. Set ini sesuai dengan ukuran
m
.Ukuran unduhan :
507.46 KiB
Ukuran dataset :
623.15 KiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'train' | 2.558 |
'validation' | 1.267 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{sakaguchi2020winogrande,
title={Winogrande: An adversarial winograd schema challenge at scale},
author={Sakaguchi, Keisuke and Le Bras, Ronan and Bhagavatula, Chandra and Choi, Yejin},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={34},
number={05},
pages={8732--8740},
year={2020}
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."
unified_qa/winogrande_s
Deskripsi Config : Dataset ini terinspirasi oleh desain Winograd Schema Challenge asli, tetapi disesuaikan untuk meningkatkan skala dan kekerasan dataset. Langkah-langkah kunci dari konstruksi dataset terdiri dari (1) prosedur crowdsourcing yang dirancang dengan hati-hati, diikuti oleh (2) pengurangan bias sistematis menggunakan algoritme AfLite baru yang menggeneralisasi asosiasi kata yang dapat dideteksi manusia menjadi asosiasi penyematan yang dapat dideteksi mesin. Set pelatihan dengan ukuran berbeda disediakan. Set ini sesuai dengan ukuran
s
.Ukuran unduhan :
479.24 KiB
Ukuran dataset :
590.47 KiB
Di-cache otomatis ( dokumentasi ): Ya
Perpecahan :
Membelah | Contoh |
---|---|
'test' | 1.767 |
'train' | 640 |
'validation' | 1.267 |
- Contoh ( tfds.as_dataframe ):
- Kutipan :
@inproceedings{sakaguchi2020winogrande,
title={Winogrande: An adversarial winograd schema challenge at scale},
author={Sakaguchi, Keisuke and Le Bras, Ronan and Bhagavatula, Chandra and Choi, Yejin},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={34},
number={05},
pages={8732--8740},
year={2020}
}
@inproceedings{khashabi-etal-2020-unifiedqa,
title = "{UNIFIEDQA}: Crossing Format Boundaries with a Single {QA} System",
author = "Khashabi, Daniel and
Min, Sewon and
Khot, Tushar and
Sabharwal, Ashish and
Tafjord, Oyvind and
Clark, Peter and
Hajishirzi, Hannaneh",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.171",
doi = "10.18653/v1/2020.findings-emnlp.171",
pages = "1896--1907",
}
Note that each UnifiedQA dataset has its own citation. Please see the source to
see the correct citation for each contained dataset."