TFDS artık Kruvasan 🥐 formatını destekliyor! Daha fazlasını öğrenmek için belgeleri okuyun.

Bu sayfa, Cloud Translation API ile çevrilmiştir.

para_pat

Referanslar:

el-en

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:para_pat/el-en')

Tanım :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Lisans : CC BY 4.0
Sürüm : 1.1.0
Bölünmeler :

Bölmek	Örnekler
`'train'`	10855

Özellikler :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "el",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

cs-en

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:para_pat/cs-en')

Tanım :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Lisans : CC BY 4.0
Sürüm : 1.1.0
Bölünmeler :

Bölmek	Örnekler
`'train'`	78977

Özellikler :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "cs",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-hu

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:para_pat/en-hu')

Tanım :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Lisans : CC BY 4.0
Sürüm : 1.1.0
Bölünmeler :

Bölmek	Örnekler
`'train'`	42629

Özellikler :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "en",
            "hu"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-ro

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:para_pat/en-ro')

Tanım :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Lisans : CC BY 4.0
Sürüm : 1.1.0
Bölünmeler :

Bölmek	Örnekler
`'train'`	48789

Özellikler :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "en",
            "ro"
        ],
        "id": null,
        "_type": "Translation"
    }
}

tr-sk

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:para_pat/en-sk')

Tanım :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Lisans : CC BY 4.0
Sürüm : 1.1.0
Bölünmeler :

Bölmek	Örnekler
`'train'`	23410

Özellikler :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "en",
            "sk"
        ],
        "id": null,
        "_type": "Translation"
    }
}

tr-İngiltere

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:para_pat/en-uk')

Tanım :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Lisans : CC BY 4.0
Sürüm : 1.1.0
Bölünmeler :

Bölmek	Örnekler
`'train'`	89226

Özellikler :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "en",
            "uk"
        ],
        "id": null,
        "_type": "Translation"
    }
}

es-fr

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:para_pat/es-fr')

Tanım :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Lisans : CC BY 4.0
Sürüm : 1.1.0
Bölünmeler :

Bölmek	Örnekler
`'train'`	32553

Özellikler :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "es",
            "fr"
        ],
        "id": null,
        "_type": "Translation"
    }
}

fr-ru

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:para_pat/fr-ru')

Tanım :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Lisans : CC BY 4.0
Sürüm : 1.1.0
Bölünmeler :

Bölmek	Örnekler
`'train'`	10889

Özellikler :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "fr",
            "ru"
        ],
        "id": null,
        "_type": "Translation"
    }
}

de-fr

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:para_pat/de-fr')

Tanım :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Lisans : CC BY 4.0
Sürüm : 1.1.0
Bölünmeler :

Bölmek	Örnekler
`'train'`	1167988

Özellikler :

{
    "translation": {
        "languages": [
            "de",
            "fr"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-ja

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:para_pat/en-ja')

Tanım :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Lisans : CC BY 4.0
Sürüm : 1.1.0
Bölünmeler :

Bölmek	Örnekler
`'train'`	6170339

Özellikler :

{
    "translation": {
        "languages": [
            "en",
            "ja"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-es

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:para_pat/en-es')

Tanım :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Lisans : CC BY 4.0
Sürüm : 1.1.0
Bölünmeler :

Bölmek	Örnekler
`'train'`	649396

Özellikler :

{
    "translation": {
        "languages": [
            "en",
            "es"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-fr

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:para_pat/en-fr')

Tanım :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Lisans : CC BY 4.0
Sürüm : 1.1.0
Bölünmeler :

Bölmek	Örnekler
`'train'`	12223525

Özellikler :

{
    "translation": {
        "languages": [
            "en",
            "fr"
        ],
        "id": null,
        "_type": "Translation"
    }
}

de-en

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:para_pat/de-en')

Tanım :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Lisans : CC BY 4.0
Sürüm : 1.1.0
Bölünmeler :

Bölmek	Örnekler
`'train'`	2165054

Özellikler :

{
    "translation": {
        "languages": [
            "de",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-ko

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:para_pat/en-ko')

Tanım :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Lisans : CC BY 4.0
Sürüm : 1.1.0
Bölünmeler :

Bölmek	Örnekler
`'train'`	2324357

Özellikler :

{
    "translation": {
        "languages": [
            "en",
            "ko"
        ],
        "id": null,
        "_type": "Translation"
    }
}

fr-ja

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:para_pat/fr-ja')

Tanım :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Lisans : CC BY 4.0
Sürüm : 1.1.0
Bölünmeler :

Bölmek	Örnekler
`'train'`	313422

Özellikler :

{
    "translation": {
        "languages": [
            "fr",
            "ja"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-zh

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:para_pat/en-zh')

Tanım :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Lisans : CC BY 4.0
Sürüm : 1.1.0
Bölünmeler :

Bölmek	Örnekler
`'train'`	4897841

Özellikler :

{
    "translation": {
        "languages": [
            "en",
            "zh"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-ru

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:para_pat/en-ru')

Tanım :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Lisans : CC BY 4.0
Sürüm : 1.1.0
Bölünmeler :

Bölmek	Örnekler
`'train'`	4296399

Özellikler :

{
    "translation": {
        "languages": [
            "en",
            "ru"
        ],
        "id": null,
        "_type": "Translation"
    }
}

fr-ko

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:para_pat/fr-ko')

Tanım :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Lisans : CC BY 4.0
Sürüm : 1.1.0
Bölünmeler :

Bölmek	Örnekler
`'train'`	120607

Özellikler :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "fr",
            "ko"
        ],
        "id": null,
        "_type": "Translation"
    }
}

ru-uk

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:para_pat/ru-uk')

Tanım :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Lisans : CC BY 4.0
Sürüm : 1.1.0
Bölünmeler :

Bölmek	Örnekler
`'train'`	85963

Özellikler :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "ru",
            "uk"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-pt

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:para_pat/en-pt')

Tanım :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Lisans : CC BY 4.0
Sürüm : 1.1.0
Bölünmeler :

Bölmek	Örnekler
`'train'`	23121

Özellikler :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "en",
            "pt"
        ],
        "id": null,
        "_type": "Translation"
    }
}