Google I/O is a wrap! Catch up on TensorFlow sessions View sessions

xtreme_xnli

  • Description:

This dataset contains machine translations of MNLI into each of the XNLI languages. The translation data is provided by XTREME. Note that this is different from the machine translated data provided by the original XNLI paper.

Split Examples
'train' 392,570
  • Feature structure:
FeaturesDict({
    'hypothesis': TranslationVariableLanguages({
        'language': Text(shape=(), dtype=tf.string),
        'translation': Text(shape=(), dtype=tf.string),
    }),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=3),
    'premise': Translation({
        'ar': Text(shape=(), dtype=tf.string),
        'bg': Text(shape=(), dtype=tf.string),
        'de': Text(shape=(), dtype=tf.string),
        'el': Text(shape=(), dtype=tf.string),
        'en': Text(shape=(), dtype=tf.string),
        'es': Text(shape=(), dtype=tf.string),
        'fr': Text(shape=(), dtype=tf.string),
        'hi': Text(shape=(), dtype=tf.string),
        'ru': Text(shape=(), dtype=tf.string),
        'sw': Text(shape=(), dtype=tf.string),
        'th': Text(shape=(), dtype=tf.string),
        'tr': Text(shape=(), dtype=tf.string),
        'ur': Text(shape=(), dtype=tf.string),
        'vi': Text(shape=(), dtype=tf.string),
        'zh': Text(shape=(), dtype=tf.string),
    }),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
hypothesis TranslationVariableLanguages
hypothesis/language Text tf.string
hypothesis/translation Text tf.string
label ClassLabel tf.int64
premise Translation
premise/ar Text tf.string
premise/bg Text tf.string
premise/de Text tf.string
premise/el Text tf.string
premise/en Text tf.string
premise/es Text tf.string
premise/fr Text tf.string
premise/hi Text tf.string
premise/ru Text tf.string
premise/sw Text tf.string
premise/th Text tf.string
premise/tr Text tf.string
premise/ur Text tf.string
premise/vi Text tf.string
premise/zh Text tf.string
  • Citation:
@article{hu2020xtreme,
      author    = {Junjie Hu and Sebastian Ruder and Aditya Siddhant and Graham Neubig and Orhan Firat and Melvin Johnson},
      title     = {XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization},
      journal   = {CoRR},
      volume    = {abs/2003.11080},
      year      = {2020},
      archivePrefix = {arXiv},
      eprint    = {2003.11080}
}