TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

ted_talks_iwslt

References:

eu_ca_2014

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:ted_talks_iwslt/eu_ca_2014')

Description:

The core of WIT3 is the TED Talks corpus, that basically redistributes the original content published by the TED Conference website (http://www.ted.com). Since 2007,
the TED Conference, based in California, has been posting all video recordings of its talks together with subtitles in English
and their translations in more than 80 languages. Aside from its cultural and social relevance, this content, which is published under the Creative Commons BYNC-ND license, also represents a precious
language resource for the machine translation research community, thanks to its size, variety of topics, and covered languages.
This effort repurposes the original content in a way which is more convenient for machine translation researchers.

License: CC-BY-NC-4.0
Version: 1.1.0
Splits:

Split	Examples
`'train'`	44

Features:

{
    "translation": {
        "languages": [
            "eu",
            "ca"
        ],
        "id": null,
        "_type": "Translation"
    }
}

eu_ca_2015

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:ted_talks_iwslt/eu_ca_2015')

Description:

The core of WIT3 is the TED Talks corpus, that basically redistributes the original content published by the TED Conference website (http://www.ted.com). Since 2007,
the TED Conference, based in California, has been posting all video recordings of its talks together with subtitles in English
and their translations in more than 80 languages. Aside from its cultural and social relevance, this content, which is published under the Creative Commons BYNC-ND license, also represents a precious
language resource for the machine translation research community, thanks to its size, variety of topics, and covered languages.
This effort repurposes the original content in a way which is more convenient for machine translation researchers.

License: CC-BY-NC-4.0
Version: 1.1.0
Splits:

Split	Examples
`'train'`	52

Features:

{
    "translation": {
        "languages": [
            "eu",
            "ca"
        ],
        "id": null,
        "_type": "Translation"
    }
}

eu_ca_2016

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:ted_talks_iwslt/eu_ca_2016')

Description:

The core of WIT3 is the TED Talks corpus, that basically redistributes the original content published by the TED Conference website (http://www.ted.com). Since 2007,
the TED Conference, based in California, has been posting all video recordings of its talks together with subtitles in English
and their translations in more than 80 languages. Aside from its cultural and social relevance, this content, which is published under the Creative Commons BYNC-ND license, also represents a precious
language resource for the machine translation research community, thanks to its size, variety of topics, and covered languages.
This effort repurposes the original content in a way which is more convenient for machine translation researchers.

License: CC-BY-NC-4.0
Version: 1.1.0
Splits:

Split	Examples
`'train'`	54

Features:

{
    "translation": {
        "languages": [
            "eu",
            "ca"
        ],
        "id": null,
        "_type": "Translation"
    }
}

nl_en_2014

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:ted_talks_iwslt/nl_en_2014')

Description:

The core of WIT3 is the TED Talks corpus, that basically redistributes the original content published by the TED Conference website (http://www.ted.com). Since 2007,
the TED Conference, based in California, has been posting all video recordings of its talks together with subtitles in English
and their translations in more than 80 languages. Aside from its cultural and social relevance, this content, which is published under the Creative Commons BYNC-ND license, also represents a precious
language resource for the machine translation research community, thanks to its size, variety of topics, and covered languages.
This effort repurposes the original content in a way which is more convenient for machine translation researchers.

License: CC-BY-NC-4.0
Version: 1.1.0
Splits:

Split	Examples
`'train'`	2966

Features:

{
    "translation": {
        "languages": [
            "nl",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

nl_en_2015

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:ted_talks_iwslt/nl_en_2015')

Description:

The core of WIT3 is the TED Talks corpus, that basically redistributes the original content published by the TED Conference website (http://www.ted.com). Since 2007,
the TED Conference, based in California, has been posting all video recordings of its talks together with subtitles in English
and their translations in more than 80 languages. Aside from its cultural and social relevance, this content, which is published under the Creative Commons BYNC-ND license, also represents a precious
language resource for the machine translation research community, thanks to its size, variety of topics, and covered languages.
This effort repurposes the original content in a way which is more convenient for machine translation researchers.

License: CC-BY-NC-4.0
Version: 1.1.0
Splits:

Split	Examples
`'train'`	3550

Features:

{
    "translation": {
        "languages": [
            "nl",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

nl_en_2016

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:ted_talks_iwslt/nl_en_2016')

Description:

The core of WIT3 is the TED Talks corpus, that basically redistributes the original content published by the TED Conference website (http://www.ted.com). Since 2007,
the TED Conference, based in California, has been posting all video recordings of its talks together with subtitles in English
and their translations in more than 80 languages. Aside from its cultural and social relevance, this content, which is published under the Creative Commons BYNC-ND license, also represents a precious
language resource for the machine translation research community, thanks to its size, variety of topics, and covered languages.
This effort repurposes the original content in a way which is more convenient for machine translation researchers.

License: CC-BY-NC-4.0
Version: 1.1.0
Splits:

Split	Examples
`'train'`	3852

Features:

{
    "translation": {
        "languages": [
            "nl",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

nl_hi_2014

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:ted_talks_iwslt/nl_hi_2014')

Description:

The core of WIT3 is the TED Talks corpus, that basically redistributes the original content published by the TED Conference website (http://www.ted.com). Since 2007,
the TED Conference, based in California, has been posting all video recordings of its talks together with subtitles in English
and their translations in more than 80 languages. Aside from its cultural and social relevance, this content, which is published under the Creative Commons BYNC-ND license, also represents a precious
language resource for the machine translation research community, thanks to its size, variety of topics, and covered languages.
This effort repurposes the original content in a way which is more convenient for machine translation researchers.

License: CC-BY-NC-4.0
Version: 1.1.0
Splits:

Split	Examples
`'train'`	367

Features:

{
    "translation": {
        "languages": [
            "nl",
            "hi"
        ],
        "id": null,
        "_type": "Translation"
    }
}

nl_hi_2015

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:ted_talks_iwslt/nl_hi_2015')

Description:

The core of WIT3 is the TED Talks corpus, that basically redistributes the original content published by the TED Conference website (http://www.ted.com). Since 2007,
the TED Conference, based in California, has been posting all video recordings of its talks together with subtitles in English
and their translations in more than 80 languages. Aside from its cultural and social relevance, this content, which is published under the Creative Commons BYNC-ND license, also represents a precious
language resource for the machine translation research community, thanks to its size, variety of topics, and covered languages.
This effort repurposes the original content in a way which is more convenient for machine translation researchers.

License: CC-BY-NC-4.0
Version: 1.1.0
Splits:

Split	Examples
`'train'`	421

Features:

{
    "translation": {
        "languages": [
            "nl",
            "hi"
        ],
        "id": null,
        "_type": "Translation"
    }
}

nl_hi_2016

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:ted_talks_iwslt/nl_hi_2016')

Description:

The core of WIT3 is the TED Talks corpus, that basically redistributes the original content published by the TED Conference website (http://www.ted.com). Since 2007,
the TED Conference, based in California, has been posting all video recordings of its talks together with subtitles in English
and their translations in more than 80 languages. Aside from its cultural and social relevance, this content, which is published under the Creative Commons BYNC-ND license, also represents a precious
language resource for the machine translation research community, thanks to its size, variety of topics, and covered languages.
This effort repurposes the original content in a way which is more convenient for machine translation researchers.

License: CC-BY-NC-4.0
Version: 1.1.0
Splits:

Split	Examples
`'train'`	496

Features:

{
    "translation": {
        "languages": [
            "nl",
            "hi"
        ],
        "id": null,
        "_type": "Translation"
    }
}

de_ja_2014

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:ted_talks_iwslt/de_ja_2014')

Description:

The core of WIT3 is the TED Talks corpus, that basically redistributes the original content published by the TED Conference website (http://www.ted.com). Since 2007,
the TED Conference, based in California, has been posting all video recordings of its talks together with subtitles in English
and their translations in more than 80 languages. Aside from its cultural and social relevance, this content, which is published under the Creative Commons BYNC-ND license, also represents a precious
language resource for the machine translation research community, thanks to its size, variety of topics, and covered languages.
This effort repurposes the original content in a way which is more convenient for machine translation researchers.

License: CC-BY-NC-4.0
Version: 1.1.0
Splits:

Split	Examples
`'train'`	2536

Features:

{
    "translation": {
        "languages": [
            "de",
            "ja"
        ],
        "id": null,
        "_type": "Translation"
    }
}

de_ja_2015

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:ted_talks_iwslt/de_ja_2015')

Description:

The core of WIT3 is the TED Talks corpus, that basically redistributes the original content published by the TED Conference website (http://www.ted.com). Since 2007,
the TED Conference, based in California, has been posting all video recordings of its talks together with subtitles in English
and their translations in more than 80 languages. Aside from its cultural and social relevance, this content, which is published under the Creative Commons BYNC-ND license, also represents a precious
language resource for the machine translation research community, thanks to its size, variety of topics, and covered languages.
This effort repurposes the original content in a way which is more convenient for machine translation researchers.

License: CC-BY-NC-4.0
Version: 1.1.0
Splits:

Split	Examples
`'train'`	3247

Features:

{
    "translation": {
        "languages": [
            "de",
            "ja"
        ],
        "id": null,
        "_type": "Translation"
    }
}

de_ja_2016

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:ted_talks_iwslt/de_ja_2016')

Description:

The core of WIT3 is the TED Talks corpus, that basically redistributes the original content published by the TED Conference website (http://www.ted.com). Since 2007,
the TED Conference, based in California, has been posting all video recordings of its talks together with subtitles in English
and their translations in more than 80 languages. Aside from its cultural and social relevance, this content, which is published under the Creative Commons BYNC-ND license, also represents a precious
language resource for the machine translation research community, thanks to its size, variety of topics, and covered languages.
This effort repurposes the original content in a way which is more convenient for machine translation researchers.

License: CC-BY-NC-4.0
Version: 1.1.0
Splits:

Split	Examples
`'train'`	3590

Features:

{
    "translation": {
        "languages": [
            "de",
            "ja"
        ],
        "id": null,
        "_type": "Translation"
    }
}

fr-ca_hi_2014

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:ted_talks_iwslt/fr-ca_hi_2014')

Description:

The core of WIT3 is the TED Talks corpus, that basically redistributes the original content published by the TED Conference website (http://www.ted.com). Since 2007,
the TED Conference, based in California, has been posting all video recordings of its talks together with subtitles in English
and their translations in more than 80 languages. Aside from its cultural and social relevance, this content, which is published under the Creative Commons BYNC-ND license, also represents a precious
language resource for the machine translation research community, thanks to its size, variety of topics, and covered languages.
This effort repurposes the original content in a way which is more convenient for machine translation researchers.

License: CC-BY-NC-4.0
Version: 1.1.0
Splits:

Split	Examples
`'train'`	127

Features:

{
    "translation": {
        "languages": [
            "fr-ca",
            "hi"
        ],
        "id": null,
        "_type": "Translation"
    }
}

fr-ca_hi_2015

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:ted_talks_iwslt/fr-ca_hi_2015')

Description:

The core of WIT3 is the TED Talks corpus, that basically redistributes the original content published by the TED Conference website (http://www.ted.com). Since 2007,
the TED Conference, based in California, has been posting all video recordings of its talks together with subtitles in English
and their translations in more than 80 languages. Aside from its cultural and social relevance, this content, which is published under the Creative Commons BYNC-ND license, also represents a precious
language resource for the machine translation research community, thanks to its size, variety of topics, and covered languages.
This effort repurposes the original content in a way which is more convenient for machine translation researchers.

License: CC-BY-NC-4.0
Version: 1.1.0
Splits:

Split	Examples
`'train'`	141

Features:

{
    "translation": {
        "languages": [
            "fr-ca",
            "hi"
        ],
        "id": null,
        "_type": "Translation"
    }
}

fr-ca_hi_2016

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:ted_talks_iwslt/fr-ca_hi_2016')

Description:

The core of WIT3 is the TED Talks corpus, that basically redistributes the original content published by the TED Conference website (http://www.ted.com). Since 2007,
the TED Conference, based in California, has been posting all video recordings of its talks together with subtitles in English
and their translations in more than 80 languages. Aside from its cultural and social relevance, this content, which is published under the Creative Commons BYNC-ND license, also represents a precious
language resource for the machine translation research community, thanks to its size, variety of topics, and covered languages.
This effort repurposes the original content in a way which is more convenient for machine translation researchers.

License: CC-BY-NC-4.0
Version: 1.1.0
Splits:

Split	Examples
`'train'`	156

Features:

{
    "translation": {
        "languages": [
            "fr-ca",
            "hi"
        ],
        "id": null,
        "_type": "Translation"
    }
}