• Description:

Translate dataset based on the data from statmt.org.

Versions exists for the different years using a combination of multiple data sources. The base wmt_translate allows you to create your own config to choose your own data/language pair by creating a custom tfds.translate.wmt.WmtConfig.

config = tfds.translate.wmt.WmtConfig(
    language_pair=("fr", "de"),
        tfds.Split.TRAIN: ["commoncrawl_frde"],
        tfds.Split.VALIDATION: ["euelections_dev2019"],
builder = tfds.builder("wmt_translate", config=config)
Split Examples
'test' 3,003
'train' 4,592,289
'validation' 3,000
  • Feature structure:
    'de': Text(shape=(), dtype=string),
    'en': Text(shape=(), dtype=string),
  • Feature documentation:
Feature Class Shape Dtype Description
de Text string
en Text string
wmt_t2t_translate/de-en (default config)