TFDS obsługuje teraz format Croissant 🥐 ! Przeczytaj dokumentację , aby dowiedzieć się więcej.

Ta strona została przetłumaczona przez Cloud Translation API.

klejnot

opis :

GEM jest środowiskiem wzorcowym dla generowania języka naturalnego z naciskiem na jego ocenę, zarówno poprzez adnotacje ludzkie, jak i zautomatyzowane metryki.

GEM ma na celu: (1) pomiar postępu NLG w 13 zestawach danych obejmujących wiele zadań i języków NLG. (2) zapewniają dogłębną analizę danych i modeli przedstawionych za pomocą zestawień danych i zestawów wyzwań. (3) opracować standardy oceny generowanego tekstu przy użyciu zarówno metryk automatycznych, jak i ludzkich.

Więcej informacji można znaleźć na stronie https://gem-benchmark.com .

Dodatkowa dokumentacja : Przeglądaj dokumenty z kodem na
Strona główna : https://gem-benchmark.com
Kod źródłowy : tfds.text.gem.Gem
Wersje :
- 1.0.0 : Wersja początkowa
- 1.0.1 : Zaktualizuj filtr złych linków dla MLSum
- 1.1.0 (domyślnie): wydanie zestawów wyzwań
Klucze nadzorowane (Zobacz dokument as_supervised ): None
Rysunek ( tfds.show_examples ): Nieobsługiwany.

gem/common_gen (domyślna konfiguracja)

Opis konfiguracji: CommonGen to zadanie generowania tekstu z ograniczeniami, powiązane z zestawem danych porównawczych, w celu jawnego testowania maszyn pod kątem zdolności generatywnego rozumowania zdroworozsądkowego. Biorąc pod uwagę zestaw wspólnych pojęć; zadaniem jest wygenerowanie spójnego zdania opisującego codzienny scenariusz z wykorzystaniem tych pojęć.
Rozmiar pliku do pobrania : 1.84 MiB
Rozmiar zestawu danych : 16.84 MiB
Automatyczne buforowanie ( dokumentacja ): Tak
Podziały :

Rozdzielać	Przykłady
`'challenge_test_scramble'`	500
`'challenge_train_sample'`	500
`'challenge_validation_sample'`	500
`'test'`	1497
`'train'`	67389
`'validation'`	993

Struktura funkcji :

FeaturesDict({
    'concept_set_id': int32,
    'concepts': Sequence(string),
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'target': string,
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
identyfikator_zestawu koncepcji	Napinacz		int32
pojęcia	Sekwencja (Tensor)	(Nic,)	strunowy
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
cel	Napinacz		strunowy

Przykłady ( tfds.as_dataframe ):

Cytat :

@inproceedings{lin2020commongen,
  title = "CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning",
  author = "Lin, Bill Yuchen  and
    Zhou, Wangchunshu  and
    Shen, Ming  and
    Zhou, Pei  and
    Bhagavatula, Chandra  and
    Choi, Yejin  and
    Ren, Xiang",
  booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
  month = nov,
  year = "2020",
  address = "Online",
  publisher = "Association for Computational Linguistics",
  url = "https://www.aclweb.org/anthology/2020.findings-emnlp.165",
  pages = "1823--1840",
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

klejnot/cs_restauracje

Opis konfiguracji : Zadaniem jest generowanie odpowiedzi w kontekście (hipotetycznego) systemu dialogowego, który dostarcza informacji o restauracjach. Dane wejściowe to podstawowy typ aktu intencji/dialogu oraz lista slotów (atrybutów) i ich wartości. Wynikiem jest zdanie w języku naturalnym.
Rozmiar pliku do pobrania : 1.46 MiB
Rozmiar zestawu danych : 2.71 MiB
Automatyczne buforowanie ( dokumentacja ): Tak
Podziały :

Rozdzielać	Przykłady
`'challenge_test_scramble'`	500
`'challenge_train_sample'`	500
`'challenge_validation_sample'`	500
`'test'`	842
`'train'`	3569
`'validation'`	781

Struktura funkcji :

FeaturesDict({
    'dialog_act': string,
    'dialog_act_delexicalized': string,
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'target': string,
    'target_delexicalized': string,
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
dialog_akt	Napinacz		strunowy
dialog_act_delexicalized	Napinacz		strunowy
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
cel	Napinacz		strunowy
target_delexicalized	Napinacz		strunowy

Przykłady ( tfds.as_dataframe ):

Cytat :

@inproceedings{cs_restaurants,
  address = {Tokyo, Japan},
  title = {Neural {Generation} for {Czech}: {Data} and {Baselines} },
  shorttitle = {Neural {Generation} for {Czech} },
  url = {https://www.aclweb.org/anthology/W19-8670/},
  urldate = {2019-10-18},
  booktitle = {Proceedings of the 12th {International} {Conference} on {Natural} {Language} {Generation} ({INLG} 2019)},
  author = {Dušek, Ondřej and Jurčíček, Filip},
  month = oct,
  year = {2019},
  pages = {563--574}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

klejnot/strzałka

Opis konfiguracji: DART to duży korpus generowania rekordów danych na tekst o otwartej domenie z wysokiej jakości adnotacjami zdań, przy czym każde wejście jest zbiorem potrójnych relacji encji zgodnie z ontologią o strukturze drzewiastej.
Rozmiar pliku do pobrania : 28.01 MiB
Rozmiar zestawu danych : 33.78 MiB
Automatyczne buforowanie ( dokumentacja ): Tak
Podziały :

Rozdzielać	Przykłady
`'test'`	6959
`'train'`	62659
`'validation'`	2768

Struktura funkcji :

FeaturesDict({
    'dart_id': int32,
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'subtree_was_extended': bool,
    'target': string,
    'target_sources': Sequence(string),
    'tripleset': Sequence(string),
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
dart_id	Napinacz		int32
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
subtree_was_extended	Napinacz		bool
cel	Napinacz		strunowy
źródła_docelowe	Sekwencja (Tensor)	(Nic,)	strunowy
trójka	Sekwencja (Tensor)	(Nic,)	strunowy

Przykłady ( tfds.as_dataframe ):

Cytat :

@article{radev2020dart,
  title=Dart: Open-domain structured data record to text generation,
  author={Radev, Dragomir and Zhang, Rui and Rau, Amrit and Sivaprasad, Abhinand and Hsieh, Chiachun and Rajani, Nazneen Fatema and Tang, Xiangru and Vyas, Aadit and Verma, Neha and Krishna, Pranav and others},
  journal={arXiv preprint arXiv:2007.02871},
  year={2020}
}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

gem/e2e_nlg

Opis konfiguracji: zestaw danych E2E jest przeznaczony do zadania zamiany danych na tekst w ograniczonej domenie — generowania opisów/rekomendacji restauracji na podstawie maksymalnie 8 różnych atrybutów (nazwa, obszar, przedział cenowy itp.)
Rozmiar pliku do pobrania : 13.99 MiB
Rozmiar zestawu danych : 16.92 MiB
Automatyczne buforowanie ( dokumentacja ): Tak
Podziały :

Rozdzielać	Przykłady
`'challenge_test_scramble'`	500
`'challenge_train_sample'`	500
`'challenge_validation_sample'`	500
`'test'`	4693
`'train'`	33525
`'validation'`	4299

Struktura funkcji :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'meaning_representation': string,
    'references': Sequence(string),
    'target': string,
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
znaczenie_reprezentacja	Napinacz		strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
cel	Napinacz		strunowy

Przykłady ( tfds.as_dataframe ):

Cytat :

@inproceedings{e2e_cleaned,
  address = {Tokyo, Japan},
  title = {Semantic {Noise} {Matters} for {Neural} {Natural} {Language} {Generation} },
  url = {https://www.aclweb.org/anthology/W19-8652/},
  booktitle = {Proceedings of the 12th {International} {Conference} on {Natural} {Language} {Generation} ({INLG} 2019)},
  author = {Dušek, Ondřej and Howcroft, David M and Rieser, Verena},
  year = {2019},
  pages = {421--426},
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

gem/mlsum_de

Opis konfiguracji: MLSum to wielkoskalowy wielojęzyczny zbiór danych podsumowujących. Jest zbudowany z internetowych serwisów informacyjnych, ten podział koncentruje się na języku niemieckim.
Rozmiar pliku do pobrania : 345.98 MiB
Rozmiar zbioru danych : 963.60 MiB
Automatyczne buforowanie ( dokumentacja ): Nie
Podziały :

Rozdzielać	Przykłady
`'challenge_test_covid'`	5058
`'challenge_train_sample'`	500
`'challenge_validation_sample'`	500
`'test'`	10695
`'train'`	220 748
`'validation'`	11392

Struktura funkcji :

FeaturesDict({
    'date': string,
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'target': string,
    'text': string,
    'title': string,
    'topic': string,
    'url': string,
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
data	Napinacz		strunowy
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
cel	Napinacz		strunowy
tekst	Napinacz		strunowy
tytuł	Napinacz		strunowy
temat	Napinacz		strunowy
adres URL	Napinacz		strunowy

Przykłady ( tfds.as_dataframe ):

Cytat :

@inproceedings{scialom-etal-2020-mlsum,
    title = "{MLSUM}: The Multilingual Summarization Corpus",
    author = {Scialom, Thomas  and Dray, Paul-Alexis  and Lamprier, Sylvain  and Piwowarski, Benjamin  and Staiano, Jacopo},
    booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
    year = {2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

gem/mlsum_es

Opis konfiguracji: MLSum to wielkoskalowy wielojęzyczny zbiór danych podsumowujących. Jest zbudowany z internetowych serwisów informacyjnych, ten podział koncentruje się na języku hiszpańskim.
Rozmiar pliku do pobrania : 501.27 MiB
Rozmiar zestawu danych : 1.29 GiB
Automatyczne buforowanie ( dokumentacja ): Nie
Podziały :

Rozdzielać	Przykłady
`'challenge_test_covid'`	1938
`'challenge_train_sample'`	500
`'challenge_validation_sample'`	500
`'test'`	13366
`'train'`	259 888
`'validation'`	9977

Struktura funkcji :

FeaturesDict({
    'date': string,
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'target': string,
    'text': string,
    'title': string,
    'topic': string,
    'url': string,
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
data	Napinacz		strunowy
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
cel	Napinacz		strunowy
tekst	Napinacz		strunowy
tytuł	Napinacz		strunowy
temat	Napinacz		strunowy
adres URL	Napinacz		strunowy

Przykłady ( tfds.as_dataframe ):

Cytat :

@inproceedings{scialom-etal-2020-mlsum,
    title = "{MLSUM}: The Multilingual Summarization Corpus",
    author = {Scialom, Thomas  and Dray, Paul-Alexis  and Lamprier, Sylvain  and Piwowarski, Benjamin  and Staiano, Jacopo},
    booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
    year = {2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

gem/schema_guided_dialog

Opis konfiguracji : Zestaw danych dialogu sterowanego schematem (SGD) zawiera 18 000 wielodomenowych zorientowanych na zadania dialogów między człowiekiem a wirtualnym asystentem, które obejmują 17 domen, od banków i wydarzeń po media, kalendarz, podróże i pogodę.
Rozmiar pliku do pobrania : 17.00 MiB
Rozmiar zestawu danych : 201.19 MiB
Automatyczne buforowanie ( dokumentacja ): Tak (challenge_test_backtranslation, challenge_test_bfp02, challenge_test_bfp05, challenge_test_nopunc, challenge_test_scramble, challenge_train_sample, challenge_validation_sample, test, validation), Tylko wtedy, gdy shuffle_files=False (pociąg)
Podziały :

Rozdzielać	Przykłady
`'challenge_test_backtranslation'`	500
`'challenge_test_bfp02'`	500
`'challenge_test_bfp05'`	500
`'challenge_test_nopunc'`	500
`'challenge_test_scramble'`	500
`'challenge_train_sample'`	500
`'challenge_validation_sample'`	500
`'test'`	10 000
`'train'`	164 982
`'validation'`	10 000

Struktura funkcji :

FeaturesDict({
    'context': Sequence(string),
    'dialog_acts': Sequence({
        'act': ClassLabel(shape=(), dtype=int64, num_classes=18),
        'slot': string,
        'values': Sequence(string),
    }),
    'dialog_id': string,
    'gem_id': string,
    'gem_parent_id': string,
    'prompt': string,
    'references': Sequence(string),
    'service': string,
    'target': string,
    'turn_id': int32,
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
kontekst	Sekwencja (Tensor)	(Nic,)	strunowy
dialog_akty	Sekwencja
dialog_akty/akt	Etykieta klasy		int64
dialog_akty/szczelina	Napinacz		strunowy
dialog_działania/wartości	Sekwencja (Tensor)	(Nic,)	strunowy
identyfikator_dialogu	Napinacz		strunowy
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
podpowiedź	Napinacz		strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
usługa	Napinacz		strunowy
cel	Napinacz		strunowy
identyfikator_zwrotu	Napinacz		int32

Przykłady ( tfds.as_dataframe ):

Cytat :

@article{rastogi2019towards,
  title={Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset},
  author={Rastogi, Abhinav and Zang, Xiaoxue and Sunkara, Srinivas and Gupta, Raghav and Khaitan, Pranav},
  journal={arXiv preprint arXiv:1909.05855},
  year={2019}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

klejnot/totto

Opis konfiguracji: ToTTo to zadanie NLG typu Table-to-Text. Zadanie jest następujące: Biorąc pod uwagę tabelę Wikipedii z nazwami wierszy, nazwami kolumn i komórkami tabeli, z podświetlonym podzbiorem komórek, wygeneruj opis w języku naturalnym dla podświetlonej części tabeli.
Rozmiar pliku do pobrania : 180.75 MiB
Rozmiar zestawu danych : 645.86 MiB
Automatyczne buforowanie ( dokumentacja ): Nie
Podziały :

Rozdzielać	Przykłady
`'challenge_test_scramble'`	500
`'challenge_train_sample'`	500
`'challenge_validation_sample'`	500
`'test'`	7700
`'train'`	121153
`'validation'`	7700

Struktura funkcji :

FeaturesDict({
    'example_id': string,
    'gem_id': string,
    'gem_parent_id': string,
    'highlighted_cells': Sequence(Sequence(int32)),
    'overlap_subset': string,
    'references': Sequence(string),
    'sentence_annotations': Sequence({
        'final_sentence': string,
        'original_sentence': string,
        'sentence_after_ambiguity': string,
        'sentence_after_deletion': string,
    }),
    'table': Sequence(Sequence({
        'column_span': int32,
        'is_header': bool,
        'row_span': int32,
        'value': string,
    })),
    'table_page_title': string,
    'table_section_text': string,
    'table_section_title': string,
    'table_webpage_url': string,
    'target': string,
    'totto_id': int32,
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
identyfikator_przykładu	Napinacz		strunowy
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
podświetlone_komórki	Sekwencja(Sekwencja(Tensor))	(Brak, brak)	int32
nakładający się_podzbiór	Napinacz		strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
zdanie_adnotacje	Sekwencja
zdanie_adnotacje/zdanie_końcowe	Napinacz		strunowy
zdanie_adnotacje/oryginalne_zdanie	Napinacz		strunowy
zdanie_adnotacje/zdanie_po_niejednoznaczności	Napinacz		strunowy
zdanie_adnotacje/zdanie_po_usunięciu	Napinacz		strunowy
stół	Sekwencja
tabela/rozpiętość_kolumn	Napinacz		int32
tabela/jest_nagłówkiem	Napinacz		bool
tabela/rozpiętość_wierszy	Napinacz		int32
tabela/wartość	Napinacz		strunowy
table_page_title	Napinacz		strunowy
tekst_sekcji_tabeli	Napinacz		strunowy
tytuł_sekcji_tabeli	Napinacz		strunowy
table_webpage_url	Napinacz		strunowy
cel	Napinacz		strunowy
totto_id	Napinacz		int32

Przykłady ( tfds.as_dataframe ):

Cytat :

@inproceedings{parikh2020totto,
  title=ToTTo: A Controlled Table-To-Text Generation Dataset,
  author={Parikh, Ankur and Wang, Xuezhi and Gehrmann, Sebastian and Faruqui, Manaal and Dhingra, Bhuwan and Yang, Diyi and Das, Dipanjan},
  booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  pages={1173--1186},
  year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

gem/web_nlg_en

Opis konfiguracji: WebNLG to dwujęzyczny zestaw danych (angielski, rosyjski) składający się z równoległych potrójnych zestawów DBpedia i krótkich tekstów, które obejmują około 450 różnych właściwości DBpedia. Dane WebNLG zostały pierwotnie stworzone w celu promowania rozwoju werbalizatorów RDF zdolnych do generowania krótkich tekstów i obsługi mikroplanowania.
Rozmiar pliku do pobrania : 12.57 MiB
Rozmiar zestawu danych : 19.91 MiB
Automatyczne buforowanie ( dokumentacja ): Tak
Podziały :

Rozdzielać	Przykłady
`'challenge_test_numbers'`	500
`'challenge_test_scramble'`	500
`'challenge_train_sample'`	502
`'challenge_validation_sample'`	499
`'test'`	1779
`'train'`	35426
`'validation'`	1667

Struktura funkcji :

FeaturesDict({
    'category': string,
    'gem_id': string,
    'gem_parent_id': string,
    'input': Sequence(string),
    'references': Sequence(string),
    'target': string,
    'webnlg_id': string,
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
Kategoria	Napinacz		strunowy
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
Wejście	Sekwencja (Tensor)	(Nic,)	strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
cel	Napinacz		strunowy
webnlg_id	Napinacz		strunowy

Przykłady ( tfds.as_dataframe ):

Cytat :

@inproceedings{gardent2017creating,
  author = "Gardent, Claire
    and Shimorina, Anastasia
    and Narayan, Shashi
    and Perez-Beltrachini, Laura",
  title = "Creating Training Corpora for NLG Micro-Planners",
  booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
  year = "2017",
  publisher = "Association for Computational Linguistics",
  pages = "179--188",
  location = "Vancouver, Canada",
  doi = "10.18653/v1/P17-1017",
  url = "http://www.aclweb.org/anthology/P17-1017"
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

gem/web_nlg_ru

Opis konfiguracji: WebNLG to dwujęzyczny zestaw danych (angielski, rosyjski) składający się z równoległych potrójnych zestawów DBpedia i krótkich tekstów, które obejmują około 450 różnych właściwości DBpedia. Dane WebNLG zostały pierwotnie stworzone w celu promowania rozwoju werbalizatorów RDF zdolnych do generowania krótkich tekstów i obsługi mikroplanowania.
Rozmiar pliku do pobrania : 7.49 MiB
Rozmiar zestawu danych : 11.30 MiB
Automatyczne buforowanie ( dokumentacja ): Tak
Podziały :

Rozdzielać	Przykłady
`'challenge_test_scramble'`	500
`'challenge_train_sample'`	501
`'challenge_validation_sample'`	500
`'test'`	1102
`'train'`	14630
`'validation'`	790

Struktura funkcji :

FeaturesDict({
    'category': string,
    'gem_id': string,
    'gem_parent_id': string,
    'input': Sequence(string),
    'references': Sequence(string),
    'target': string,
    'webnlg_id': string,
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
Kategoria	Napinacz		strunowy
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
Wejście	Sekwencja (Tensor)	(Nic,)	strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
cel	Napinacz		strunowy
webnlg_id	Napinacz		strunowy

Przykłady ( tfds.as_dataframe ):

Cytat :

@inproceedings{gardent2017creating,
  author = "Gardent, Claire
    and Shimorina, Anastasia
    and Narayan, Shashi
    and Perez-Beltrachini, Laura",
  title = "Creating Training Corpora for NLG Micro-Planners",
  booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
  year = "2017",
  publisher = "Association for Computational Linguistics",
  pages = "179--188",
  location = "Vancouver, Canada",
  doi = "10.18653/v1/P17-1017",
  url = "http://www.aclweb.org/anthology/P17-1017"
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

gem/wiki_auto_asset_turk

Opis konfiguracji: WikiAuto zapewnia zestaw wyrównanych zdań z angielskiej Wikipedii i prostej angielskiej Wikipedii jako źródło do szkolenia systemów upraszczania zdań. ASSET i TURK to wysokiej jakości zestawy danych upraszczających używane do testowania.
Rozmiar pliku do pobrania : 121.01 MiB
Rozmiar zestawu danych : 202.40 MiB
Auto-cached ( documentation ): Yes (challenge_test_asset_backtranslation, challenge_test_asset_bfp02, challenge_test_asset_bfp05, challenge_test_asset_nopunc, challenge_test_turk_backtranslation, challenge_test_turk_bfp02, challenge_test_turk_bfp05, challenge_test_turk_nopunc, challenge_train_sample, challenge_validation_sample, test_asset, test_turk, validation), Only when shuffle_files=False (train)
Podziały :

Rozdzielać	Przykłady
`'challenge_test_asset_backtranslation'`	359
`'challenge_test_asset_bfp02'`	359
`'challenge_test_asset_bfp05'`	359
`'challenge_test_asset_nopunc'`	359
`'challenge_test_turk_backtranslation'`	359
`'challenge_test_turk_bfp02'`	359
`'challenge_test_turk_bfp05'`	359
`'challenge_test_turk_nopunc'`	359
`'challenge_train_sample'`	500
`'challenge_validation_sample'`	500
`'test_asset'`	359
`'test_turk'`	359
`'train'`	483801
`'validation'`	20 000

Struktura funkcji :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'target': string,
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
źródło	Napinacz		strunowy
cel	Napinacz		strunowy

Przykłady ( tfds.as_dataframe ):

Cytat :

@inproceedings{jiang-etal-2020-neural,
    title = "Neural {CRF} Model for Sentence Alignment in Text Simplification",
    author = "Jiang, Chao  and
      Maddela, Mounica  and
      Lan, Wuwei  and
      Zhong, Yang  and
      Xu, Wei",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.709",
    doi = "10.18653/v1/2020.acl-main.709",
    pages = "7943--7960",
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

klejnot/xsum

Opis konfiguracji : Zestaw danych służy do abstrakcyjnego podsumowania w jego ekstremalnej formie, polegającego na podsumowaniu dokumentu w jednym zdaniu.
Rozmiar pliku do pobrania : 246.31 MiB
Rozmiar zestawu danych : 78.89 MiB
Automatyczne buforowanie ( dokumentacja ): Tak
Podziały :

Rozdzielać	Przykłady
`'challenge_test_backtranslation'`	500
`'challenge_test_bfp_02'`	500
`'challenge_test_bfp_05'`	500
`'challenge_test_covid'`	401
`'challenge_test_nopunc'`	500
`'challenge_train_sample'`	500
`'challenge_validation_sample'`	500
`'test'`	1166
`'train'`	23206
`'validation'`	1117

Struktura funkcji :

FeaturesDict({
    'document': string,
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'target': string,
    'xsum_id': string,
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
dokument	Napinacz		strunowy
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
cel	Napinacz		strunowy
xsum_id	Napinacz		strunowy

Przykłady ( tfds.as_dataframe ):

Cytat :

@inproceedings{Narayan2018dont,
  author = "Shashi Narayan and Shay B. Cohen and Mirella Lapata",
  title = "Don't Give Me the Details, Just the Summary! {T}opic-Aware Convolutional Neural Networks for Extreme Summarization",
  booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing ",
  year = "2018",
  address = "Brussels, Belgium",
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

klejnot/wiki_lingua_arabski_ar

Opis konfiguracji: Wikilingua to wielkoskalowy, wielojęzyczny zbiór danych do oceny wielojęzycznych abstrakcyjnych systemów podsumowujących.
Rozmiar pliku do pobrania : 56.25 MiB
Rozmiar zestawu danych : 291.42 MiB
Automatyczne buforowanie ( dokumentacja ): Nie
Podziały :

Rozdzielać	Przykłady
`'test'`	5841
`'train'`	20441
`'validation'`	2919

Struktura funkcji :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'ar': Text(shape=(), dtype=string),
        'en': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'ar': Text(shape=(), dtype=string),
        'en': Text(shape=(), dtype=string),
    }),
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
źródło	Napinacz		strunowy
źródło_wyrównane	Tłumaczenie
źródło_wyrównane/ar	Tekst		strunowy
source_aligned/en	Tekst		strunowy
cel	Napinacz		strunowy
cel_wyrównany	Tłumaczenie
wyrównany_docelowo/ar	Tekst		strunowy
target_aligned/en	Tekst		strunowy

Przykłady ( tfds.as_dataframe ):

Cytat :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

gem/wiki_lingua_chinese_zh

Opis konfiguracji: Wikilingua to wielkoskalowy, wielojęzyczny zbiór danych do oceny wielojęzycznych abstrakcyjnych systemów podsumowujących.
Rozmiar pliku do pobrania : 31.38 MiB
Rozmiar zestawu danych : 122.06 MiB
Automatyczne buforowanie ( dokumentacja ): Tak
Podziały :

Rozdzielać	Przykłady
`'test'`	3775
`'train'`	13211
`'validation'`	1886

Struktura funkcji :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'zh': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'zh': Text(shape=(), dtype=string),
    }),
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
źródło	Napinacz		strunowy
źródło_wyrównane	Tłumaczenie
source_aligned/en	Tekst		strunowy
źródło_wyrównane/zh	Tekst		strunowy
cel	Napinacz		strunowy
cel_wyrównany	Tłumaczenie
target_aligned/en	Tekst		strunowy
wyrównany_docelowo/zh	Tekst		strunowy

Przykłady ( tfds.as_dataframe ):

Cytat :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

gem/wiki_lingua_czech_cs

Opis konfiguracji: Wikilingua to wielkoskalowy, wielojęzyczny zbiór danych do oceny wielojęzycznych abstrakcyjnych systemów podsumowujących.
Rozmiar pliku do pobrania : 13.84 MiB
Rozmiar zestawu danych : 58.05 MiB
Automatyczne buforowanie ( dokumentacja ): Tak
Podziały :

Rozdzielać	Przykłady
`'test'`	1438
`'train'`	5033
`'validation'`	718

Struktura funkcji :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'cs': Text(shape=(), dtype=string),
        'en': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'cs': Text(shape=(), dtype=string),
        'en': Text(shape=(), dtype=string),
    }),
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
źródło	Napinacz		strunowy
źródło_wyrównane	Tłumaczenie
źródło_wyrównane/cs	Tekst		strunowy
source_aligned/en	Tekst		strunowy
cel	Napinacz		strunowy
cel_wyrównany	Tłumaczenie
target_aligned/cs	Tekst		strunowy
target_aligned/en	Tekst		strunowy

Przykłady ( tfds.as_dataframe ):

Cytat :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

gem/wiki_lingua_dutch_nl

Opis konfiguracji: Wikilingua to wielkoskalowy, wielojęzyczny zbiór danych do oceny wielojęzycznych abstrakcyjnych systemów podsumowujących.
Rozmiar pliku do pobrania : 53.88 MiB
Rozmiar zestawu danych : 237.97 MiB
Automatyczne buforowanie ( dokumentacja ): Tak (test, walidacja), Tylko wtedy, gdy shuffle_files=False (pociąg)
Podziały :

Rozdzielać	Przykłady
`'test'`	6248
`'train'`	21866
`'validation'`	3123

Struktura funkcji :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'nl': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'nl': Text(shape=(), dtype=string),
    }),
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
źródło	Napinacz		strunowy
źródło_wyrównane	Tłumaczenie
source_aligned/en	Tekst		strunowy
źródło_wyrównane/nl	Tekst		strunowy
cel	Napinacz		strunowy
cel_wyrównany	Tłumaczenie
target_aligned/en	Tekst		strunowy
target_aligned/nl	Tekst		strunowy

Przykłady ( tfds.as_dataframe ):

Cytat :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

gem/wiki_lingua_english_en

Opis konfiguracji: Wikilingua to wielkoskalowy, wielojęzyczny zbiór danych do oceny wielojęzycznych abstrakcyjnych systemów podsumowujących.
Rozmiar pliku do pobrania : 112.56 MiB
Rozmiar zbioru danych : 657.51 MiB
Automatyczne buforowanie ( dokumentacja ): Nie
Podziały :

Rozdzielać	Przykłady
`'test'`	28614
`'train'`	99 020
`'validation'`	13823

Struktura funkcji :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
    }),
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
źródło	Napinacz		strunowy
źródło_wyrównane	Tłumaczenie
source_aligned/en	Tekst		strunowy
cel	Napinacz		strunowy
cel_wyrównany	Tłumaczenie
target_aligned/en	Tekst		strunowy

Przykłady ( tfds.as_dataframe ):

Cytat :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

klejnot/wiki_lingua_francuski_fr

Opis konfiguracji: Wikilingua to wielkoskalowy, wielojęzyczny zbiór danych do oceny wielojęzycznych abstrakcyjnych systemów podsumowujących.
Rozmiar pliku do pobrania : 113.26 MiB
Rozmiar zestawu danych : 522.28 MiB
Automatyczne buforowanie ( dokumentacja ): Nie
Podziały :

Rozdzielać	Przykłady
`'test'`	12731
`'train'`	44556
`'validation'`	6364

Struktura funkcji :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'fr': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'fr': Text(shape=(), dtype=string),
    }),
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
źródło	Napinacz		strunowy
źródło_wyrównane	Tłumaczenie
source_aligned/en	Tekst		strunowy
wyrównane_źródło/fr	Tekst		strunowy
cel	Napinacz		strunowy
cel_wyrównany	Tłumaczenie
target_aligned/en	Tekst		strunowy
target_aligned/fr	Tekst		strunowy

Przykłady ( tfds.as_dataframe ):

Cytat :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

gem/wiki_lingua_german_de

Opis konfiguracji: Wikilingua to wielkoskalowy, wielojęzyczny zbiór danych do oceny wielojęzycznych abstrakcyjnych systemów podsumowujących.
Rozmiar pliku do pobrania : 102.65 MiB
Rozmiar zestawu danych : 452.46 MiB
Automatyczne buforowanie ( dokumentacja ): Nie
Podziały :

Rozdzielać	Przykłady
`'test'`	11669
`'train'`	40839
`'validation'`	5833

Struktura funkcji :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'de': Text(shape=(), dtype=string),
        'en': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'de': Text(shape=(), dtype=string),
        'en': Text(shape=(), dtype=string),
    }),
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
źródło	Napinacz		strunowy
źródło_wyrównane	Tłumaczenie
source_aligned/de	Tekst		strunowy
source_aligned/en	Tekst		strunowy
cel	Napinacz		strunowy
cel_wyrównany	Tłumaczenie
target_aligned/de	Tekst		strunowy
target_aligned/en	Tekst		strunowy

Przykłady ( tfds.as_dataframe ):

Cytat :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

gem/wiki_lingua_hindi_hi

Opis konfiguracji: Wikilingua to wielkoskalowy, wielojęzyczny zbiór danych do oceny wielojęzycznych abstrakcyjnych systemów podsumowujących.
Rozmiar pliku do pobrania : 20.07 MiB
Rozmiar zestawu danych : 138.06 MiB
Automatyczne buforowanie ( dokumentacja ): Tak
Podziały :

Rozdzielać	Przykłady
`'test'`	1984
`'train'`	6942
`'validation'`	991

Struktura funkcji :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'hi': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'hi': Text(shape=(), dtype=string),
    }),
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
źródło	Napinacz		strunowy
źródło_wyrównane	Tłumaczenie
source_aligned/en	Tekst		strunowy
source_aligned/cześć	Tekst		strunowy
cel	Napinacz		strunowy
cel_wyrównany	Tłumaczenie
target_aligned/en	Tekst		strunowy
target_aligned/cześć	Tekst		strunowy

Przykłady ( tfds.as_dataframe ):

Cytat :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

gem/wiki_lingua_indonezyjski_id

Opis konfiguracji: Wikilingua to wielkoskalowy, wielojęzyczny zbiór danych do oceny wielojęzycznych abstrakcyjnych systemów podsumowujących.
Rozmiar pliku do pobrania : 80.08 MiB
Rozmiar zestawu danych : 370.63 MiB
Automatyczne buforowanie ( dokumentacja ): Nie
Podziały :

Rozdzielać	Przykłady
`'test'`	9497
`'train'`	33237
`'validation'`	4747

Struktura funkcji :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'id': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'id': Text(shape=(), dtype=string),
    }),
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
źródło	Napinacz		strunowy
źródło_wyrównane	Tłumaczenie
source_aligned/en	Tekst		strunowy
źródło_wyrównane/identyfikator	Tekst		strunowy
cel	Napinacz		strunowy
cel_wyrównany	Tłumaczenie
target_aligned/en	Tekst		strunowy
wyrównany_docelowo/identyfikator	Tekst		strunowy

Przykłady ( tfds.as_dataframe ):

Cytat :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

gem/wiki_lingua_italian_it

Opis konfiguracji: Wikilingua to wielkoskalowy, wielojęzyczny zbiór danych do oceny wielojęzycznych abstrakcyjnych systemów podsumowujących.
Rozmiar pliku do pobrania : 84.80 MiB
Rozmiar zestawu danych : 374.40 MiB
Automatyczne buforowanie ( dokumentacja ): Nie
Podziały :

Rozdzielać	Przykłady
`'test'`	10189
`'train'`	35661
`'validation'`	5093

Struktura funkcji :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'it': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'it': Text(shape=(), dtype=string),
    }),
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
źródło	Napinacz		strunowy
źródło_wyrównane	Tłumaczenie
source_aligned/en	Tekst		strunowy
źródło_wyrównane/it	Tekst		strunowy
cel	Napinacz		strunowy
cel_wyrównany	Tłumaczenie
target_aligned/en	Tekst		strunowy
cel_wyrównany/it	Tekst		strunowy

Przykłady ( tfds.as_dataframe ):

Cytat :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

gem/wiki_lingua_japanese_ja

Opis konfiguracji: Wikilingua to wielkoskalowy, wielojęzyczny zbiór danych do oceny wielojęzycznych abstrakcyjnych systemów podsumowujących.
Rozmiar pliku do pobrania : 21.75 MiB
Rozmiar zestawu danych : 103.19 MiB
Automatyczne buforowanie ( dokumentacja ): Tak
Podziały :

Rozdzielać	Przykłady
`'test'`	2530
`'train'`	8853
`'validation'`	1264

Struktura funkcji :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'ja': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'ja': Text(shape=(), dtype=string),
    }),
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
źródło	Napinacz		strunowy
źródło_wyrównane	Tłumaczenie
source_aligned/en	Tekst		strunowy
źródło_wyrównane/ja	Tekst		strunowy
cel	Napinacz		strunowy
cel_wyrównany	Tłumaczenie
target_aligned/en	Tekst		strunowy
wyrównany_docelowo/ja	Tekst		strunowy

Przykłady ( tfds.as_dataframe ):

Cytat :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

gem/wiki_lingua_korean_ko

Opis konfiguracji: Wikilingua to wielkoskalowy, wielojęzyczny zbiór danych do oceny wielojęzycznych abstrakcyjnych systemów podsumowujących.
Rozmiar pliku do pobrania : 22.26 MiB
Rozmiar zestawu danych : 102.35 MiB
Automatyczne buforowanie ( dokumentacja ): Tak
Podziały :

Rozdzielać	Przykłady
`'test'`	2436
`'train'`	8524
`'validation'`	1216

Struktura funkcji :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'ko': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'ko': Text(shape=(), dtype=string),
    }),
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
źródło	Napinacz		strunowy
źródło_wyrównane	Tłumaczenie
source_aligned/en	Tekst		strunowy
źródło_wyrównane/ko	Tekst		strunowy
cel	Napinacz		strunowy
cel_wyrównany	Tłumaczenie
target_aligned/en	Tekst		strunowy
target_aligned/ko	Tekst		strunowy

Przykłady ( tfds.as_dataframe ):

Cytat :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

gem/wiki_lingua_portuguese_pt

Opis konfiguracji: Wikilingua to wielkoskalowy, wielojęzyczny zbiór danych do oceny wielojęzycznych abstrakcyjnych systemów podsumowujących.
Rozmiar pliku do pobrania : 131.17 MiB
Rozmiar zestawu danych : 570.46 MiB
Automatyczne buforowanie ( dokumentacja ): Nie
Podziały :

Rozdzielać	Przykłady
`'test'`	16331
`'train'`	57159
`'validation'`	8165

Struktura funkcji :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'pt': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'pt': Text(shape=(), dtype=string),
    }),
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
źródło	Napinacz		strunowy
źródło_wyrównane	Tłumaczenie
source_aligned/en	Tekst		strunowy
źródło_wyrównane/pt	Tekst		strunowy
cel	Napinacz		strunowy
cel_wyrównany	Tłumaczenie
target_aligned/en	Tekst		strunowy
wyrównany_docelowo/punkt	Tekst		strunowy

Przykłady ( tfds.as_dataframe ):

Cytat :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

gem/wiki_lingua_russian_ru

Opis konfiguracji: Wikilingua to wielkoskalowy, wielojęzyczny zbiór danych do oceny wielojęzycznych abstrakcyjnych systemów podsumowujących.
Rozmiar pliku do pobrania : 101.36 MiB
Rozmiar zestawu danych : 564.69 MiB
Automatyczne buforowanie ( dokumentacja ): Nie
Podziały :

Rozdzielać	Przykłady
`'test'`	10580
`'train'`	37028
`'validation'`	5288

Struktura funkcji :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'ru': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'ru': Text(shape=(), dtype=string),
    }),
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
źródło	Napinacz		strunowy
źródło_wyrównane	Tłumaczenie
source_aligned/en	Tekst		strunowy
źródło_wyrównane/ru	Tekst		strunowy
cel	Napinacz		strunowy
cel_wyrównany	Tłumaczenie
target_aligned/en	Tekst		strunowy
target_aligned/ru	Tekst		strunowy

Przykłady ( tfds.as_dataframe ):

Cytat :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

klejnot/wiki_lingua_hiszpański_es

Opis konfiguracji: Wikilingua to wielkoskalowy, wielojęzyczny zbiór danych do oceny wielojęzycznych abstrakcyjnych systemów podsumowujących.
Rozmiar pliku do pobrania : 189.06 MiB
Rozmiar zestawu danych : 849.75 MiB
Automatyczne buforowanie ( dokumentacja ): Nie
Podziały :

Rozdzielać	Przykłady
`'test'`	22632
`'train'`	79212
`'validation'`	11316

Struktura funkcji :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'es': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'es': Text(shape=(), dtype=string),
    }),
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
źródło	Napinacz		strunowy
źródło_wyrównane	Tłumaczenie
source_aligned/en	Tekst		strunowy
source_aligned/es	Tekst		strunowy
cel	Napinacz		strunowy
cel_wyrównany	Tłumaczenie
target_aligned/en	Tekst		strunowy
target_aligned/es	Tekst		strunowy

Przykłady ( tfds.as_dataframe ):

Cytat :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

gem/wiki_lingua_thai_th

Opis konfiguracji: Wikilingua to wielkoskalowy, wielojęzyczny zbiór danych do oceny wielojęzycznych abstrakcyjnych systemów podsumowujących.
Rozmiar pliku do pobrania : 28.60 MiB
Rozmiar zestawu danych : 193.77 MiB
Automatyczne buforowanie ( dokumentacja ): Tak (test, walidacja), Tylko wtedy, gdy shuffle_files=False (pociąg)
Podziały :

Rozdzielać	Przykłady
`'test'`	2950
`'train'`	10325
`'validation'`	1475

Struktura funkcji :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'th': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'th': Text(shape=(), dtype=string),
    }),
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
źródło	Napinacz		strunowy
źródło_wyrównane	Tłumaczenie
source_aligned/en	Tekst		strunowy
źródło_wyrównane/th	Tekst		strunowy
cel	Napinacz		strunowy
cel_wyrównany	Tłumaczenie
target_aligned/en	Tekst		strunowy
target_aligned/th	Tekst		strunowy

Przykłady ( tfds.as_dataframe ):

Cytat :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

gem/wiki_lingua_turkish_tr

Opis konfiguracji: Wikilingua to wielkoskalowy, wielojęzyczny zbiór danych do oceny wielojęzycznych abstrakcyjnych systemów podsumowujących.
Rozmiar pliku do pobrania : 6.73 MiB
Rozmiar zestawu danych : 30.75 MiB
Automatyczne buforowanie ( dokumentacja ): Tak
Podziały :

Rozdzielać	Przykłady
`'test'`	900
`'train'`	3148
`'validation'`	449

Struktura funkcji :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'tr': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'tr': Text(shape=(), dtype=string),
    }),
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
źródło	Napinacz		strunowy
źródło_wyrównane	Tłumaczenie
source_aligned/en	Tekst		strunowy
źródło_wyrównane/tr	Tekst		strunowy
cel	Napinacz		strunowy
cel_wyrównany	Tłumaczenie
target_aligned/en	Tekst		strunowy
wyrównany_docelowo/tr	Tekst		strunowy

Przykłady ( tfds.as_dataframe ):

Cytat :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

gem/wiki_lingua_vietnamese_vi

Opis konfiguracji: Wikilingua to wielkoskalowy, wielojęzyczny zbiór danych do oceny wielojęzycznych abstrakcyjnych systemów podsumowujących.
Rozmiar pliku do pobrania : 36.27 MiB
Rozmiar zestawu danych : 179.77 MiB
Automatyczne buforowanie ( dokumentacja ): Tak
Podziały :

Rozdzielać	Przykłady
`'test'`	3917
`'train'`	13707
`'validation'`	1957

Struktura funkcji :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'vi': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'vi': Text(shape=(), dtype=string),
    }),
})

Dokumentacja funkcji :

Funkcja	Klasa	Kształt	Typ D
	FunkcjeDict
klejnot_id	Napinacz		strunowy
identyfikator_nadrzędnego klejnotu	Napinacz		strunowy
Bibliografia	Sekwencja (Tensor)	(Nic,)	strunowy
źródło	Napinacz		strunowy
źródło_wyrównane	Tłumaczenie
source_aligned/en	Tekst		strunowy
źródło_wyrównane/vi	Tekst		strunowy
cel	Napinacz		strunowy
cel_wyrównany	Tłumaczenie
target_aligned/en	Tekst		strunowy
wyrównany_docelowo/vi	Tekst		strunowy

Przykłady ( tfds.as_dataframe ):

Cytat :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."