TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

un_pc

References:

ar-en

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:un_pc/ar-en')

Description:

This parallel corpus consists of manually translated UN documents from the last 25 years (1990 to 2014) for the six official UN languages, Arabic, Chinese, English, French, Russian, and Spanish.

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'train'`	20044478

Features:

{
    "translation": {
        "languages": [
            "ar",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

ar-es

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:un_pc/ar-es')

Description:

This parallel corpus consists of manually translated UN documents from the last 25 years (1990 to 2014) for the six official UN languages, Arabic, Chinese, English, French, Russian, and Spanish.

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'train'`	20532014

Features:

{
    "translation": {
        "languages": [
            "ar",
            "es"
        ],
        "id": null,
        "_type": "Translation"
    }
}

ar-fr

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:un_pc/ar-fr')

Description:

This parallel corpus consists of manually translated UN documents from the last 25 years (1990 to 2014) for the six official UN languages, Arabic, Chinese, English, French, Russian, and Spanish.

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'train'`	20281645

Features:

{
    "translation": {
        "languages": [
            "ar",
            "fr"
        ],
        "id": null,
        "_type": "Translation"
    }
}

ar-ru

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:un_pc/ar-ru')

Description:

This parallel corpus consists of manually translated UN documents from the last 25 years (1990 to 2014) for the six official UN languages, Arabic, Chinese, English, French, Russian, and Spanish.

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'train'`	20571334

Features:

{
    "translation": {
        "languages": [
            "ar",
            "ru"
        ],
        "id": null,
        "_type": "Translation"
    }
}

ar-zh

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:un_pc/ar-zh')

Description:

This parallel corpus consists of manually translated UN documents from the last 25 years (1990 to 2014) for the six official UN languages, Arabic, Chinese, English, French, Russian, and Spanish.

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'train'`	17306056

Features:

{
    "translation": {
        "languages": [
            "ar",
            "zh"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-es

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:un_pc/en-es')

Description:

This parallel corpus consists of manually translated UN documents from the last 25 years (1990 to 2014) for the six official UN languages, Arabic, Chinese, English, French, Russian, and Spanish.

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'train'`	25227004

Features:

{
    "translation": {
        "languages": [
            "en",
            "es"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-fr

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:un_pc/en-fr')

Description:

This parallel corpus consists of manually translated UN documents from the last 25 years (1990 to 2014) for the six official UN languages, Arabic, Chinese, English, French, Russian, and Spanish.

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'train'`	30340652

Features:

{
    "translation": {
        "languages": [
            "en",
            "fr"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-ru

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:un_pc/en-ru')

Description:

This parallel corpus consists of manually translated UN documents from the last 25 years (1990 to 2014) for the six official UN languages, Arabic, Chinese, English, French, Russian, and Spanish.

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'train'`	25173398

Features:

{
    "translation": {
        "languages": [
            "en",
            "ru"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-zh

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:un_pc/en-zh')

Description:

This parallel corpus consists of manually translated UN documents from the last 25 years (1990 to 2014) for the six official UN languages, Arabic, Chinese, English, French, Russian, and Spanish.

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'train'`	17451549

Features:

{
    "translation": {
        "languages": [
            "en",
            "zh"
        ],
        "id": null,
        "_type": "Translation"
    }
}

es-fr

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:un_pc/es-fr')

Description:

This parallel corpus consists of manually translated UN documents from the last 25 years (1990 to 2014) for the six official UN languages, Arabic, Chinese, English, French, Russian, and Spanish.

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'train'`	25887160

Features:

{
    "translation": {
        "languages": [
            "es",
            "fr"
        ],
        "id": null,
        "_type": "Translation"
    }
}

es-ru

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:un_pc/es-ru')

Description:

This parallel corpus consists of manually translated UN documents from the last 25 years (1990 to 2014) for the six official UN languages, Arabic, Chinese, English, French, Russian, and Spanish.

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'train'`	22294106

Features:

{
    "translation": {
        "languages": [
            "es",
            "ru"
        ],
        "id": null,
        "_type": "Translation"
    }
}

es-zh

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:un_pc/es-zh')

Description:

This parallel corpus consists of manually translated UN documents from the last 25 years (1990 to 2014) for the six official UN languages, Arabic, Chinese, English, French, Russian, and Spanish.

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'train'`	17599223

Features:

{
    "translation": {
        "languages": [
            "es",
            "zh"
        ],
        "id": null,
        "_type": "Translation"
    }
}

fr-ru

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:un_pc/fr-ru')

Description:

This parallel corpus consists of manually translated UN documents from the last 25 years (1990 to 2014) for the six official UN languages, Arabic, Chinese, English, French, Russian, and Spanish.

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'train'`	25219973

Features:

{
    "translation": {
        "languages": [
            "fr",
            "ru"
        ],
        "id": null,
        "_type": "Translation"
    }
}

fr-zh

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:un_pc/fr-zh')

Description:

This parallel corpus consists of manually translated UN documents from the last 25 years (1990 to 2014) for the six official UN languages, Arabic, Chinese, English, French, Russian, and Spanish.

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'train'`	17521170

Features:

{
    "translation": {
        "languages": [
            "fr",
            "zh"
        ],
        "id": null,
        "_type": "Translation"
    }
}

ru-zh

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:un_pc/ru-zh')

Description:

This parallel corpus consists of manually translated UN documents from the last 25 years (1990 to 2014) for the six official UN languages, Arabic, Chinese, English, French, Russian, and Spanish.

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'train'`	17920922

Features:

{
    "translation": {
        "languages": [
            "ru",
            "zh"
        ],
        "id": null,
        "_type": "Translation"
    }
}