TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

opus_gnome

References:

ar-bal

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus_gnome/ar-bal')

Description:

A parallel corpus of GNOME localization files. Source: https://l10n.gnome.org

187 languages, 12,822 bitexts
total number of files: 113,344
total number of tokens: 267.27M
total number of sentence fragments: 58.12M

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'train'`	60

Features:

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "ar",
            "bal"
        ],
        "id": null,
        "_type": "Translation"
    }
}

bg-csb

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus_gnome/bg-csb')

Description:

A parallel corpus of GNOME localization files. Source: https://l10n.gnome.org

187 languages, 12,822 bitexts
total number of files: 113,344
total number of tokens: 267.27M
total number of sentence fragments: 58.12M

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'train'`	1768

Features:

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "bg",
            "csb"
        ],
        "id": null,
        "_type": "Translation"
    }
}

ca-en_GB

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus_gnome/ca-en_GB')

Description:

A parallel corpus of GNOME localization files. Source: https://l10n.gnome.org

187 languages, 12,822 bitexts
total number of files: 113,344
total number of tokens: 267.27M
total number of sentence fragments: 58.12M

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'train'`	7982

Features:

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "ca",
            "en_GB"
        ],
        "id": null,
        "_type": "Translation"
    }
}

cs-eo

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus_gnome/cs-eo')

Description:

A parallel corpus of GNOME localization files. Source: https://l10n.gnome.org

187 languages, 12,822 bitexts
total number of files: 113,344
total number of tokens: 267.27M
total number of sentence fragments: 58.12M

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'train'`	73

Features:

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "cs",
            "eo"
        ],
        "id": null,
        "_type": "Translation"
    }
}

de-ha

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus_gnome/de-ha')

Description:

A parallel corpus of GNOME localization files. Source: https://l10n.gnome.org

187 languages, 12,822 bitexts
total number of files: 113,344
total number of tokens: 267.27M
total number of sentence fragments: 58.12M

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'train'`	216

Features:

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "de",
            "ha"
        ],
        "id": null,
        "_type": "Translation"
    }
}

cs-tk

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus_gnome/cs-tk')

Description:

A parallel corpus of GNOME localization files. Source: https://l10n.gnome.org

187 languages, 12,822 bitexts
total number of files: 113,344
total number of tokens: 267.27M
total number of sentence fragments: 58.12M

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'train'`	18686

Features:

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "cs",
            "tk"
        ],
        "id": null,
        "_type": "Translation"
    }
}

da-vi

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus_gnome/da-vi')

Description:

A parallel corpus of GNOME localization files. Source: https://l10n.gnome.org

187 languages, 12,822 bitexts
total number of files: 113,344
total number of tokens: 267.27M
total number of sentence fragments: 58.12M

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'train'`	149

Features:

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "da",
            "vi"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en_GB-my

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus_gnome/en_GB-my')

Description:

A parallel corpus of GNOME localization files. Source: https://l10n.gnome.org

187 languages, 12,822 bitexts
total number of files: 113,344
total number of tokens: 267.27M
total number of sentence fragments: 58.12M

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'train'`	28232

Features:

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "en_GB",
            "my"
        ],
        "id": null,
        "_type": "Translation"
    }
}

el-sk

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus_gnome/el-sk')

Description:

A parallel corpus of GNOME localization files. Source: https://l10n.gnome.org

187 languages, 12,822 bitexts
total number of files: 113,344
total number of tokens: 267.27M
total number of sentence fragments: 58.12M

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'train'`	150

Features:

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "el",
            "sk"
        ],
        "id": null,
        "_type": "Translation"
    }
}

de-tt

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus_gnome/de-tt')

Description:

A parallel corpus of GNOME localization files. Source: https://l10n.gnome.org

187 languages, 12,822 bitexts
total number of files: 113,344
total number of tokens: 267.27M
total number of sentence fragments: 58.12M

License: No known license
Version: 1.0.0
Splits:

Split	Examples
`'train'`	2169

Features:

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "de",
            "tt"
        ],
        "id": null,
        "_type": "Translation"
    }
}