يدعم TFDS الآن تنسيق الكرواسون 🥐 ! اقرأ الوثائق لمعرفة المزيد.

تمت ترجمة هذه الصفحة بواسطة Cloud Translation API‏.

para_pat

مراجع:

إل إن

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:para_pat/el-en')

وصف :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

الترخيص : CC BY 4.0
الإصدار : 1.1.0
الإنشقاقات :

ينقسم	أمثلة
`'train'`	10855

سمات :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "el",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

CS-EN

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:para_pat/cs-en')

وصف :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

الترخيص : CC BY 4.0
الإصدار : 1.1.0
الإنشقاقات :

ينقسم	أمثلة
`'train'`	78977

سمات :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "cs",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

أون هو

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:para_pat/en-hu')

وصف :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

الترخيص : CC BY 4.0
الإصدار : 1.1.0
الإنشقاقات :

ينقسم	أمثلة
`'train'`	42629

سمات :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "en",
            "hu"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-ro

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:para_pat/en-ro')

وصف :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

الترخيص : CC BY 4.0
الإصدار : 1.1.0
الإنشقاقات :

ينقسم	أمثلة
`'train'`	48789

سمات :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "en",
            "ro"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-sk

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:para_pat/en-sk')

وصف :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

الترخيص : CC BY 4.0
الإصدار : 1.1.0
الإنشقاقات :

ينقسم	أمثلة
`'train'`	23410

سمات :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "en",
            "sk"
        ],
        "id": null,
        "_type": "Translation"
    }
}

في المملكة المتحدة

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:para_pat/en-uk')

وصف :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

الترخيص : CC BY 4.0
الإصدار : 1.1.0
الإنشقاقات :

ينقسم	أمثلة
`'train'`	89226

سمات :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "en",
            "uk"
        ],
        "id": null,
        "_type": "Translation"
    }
}

es-fr

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:para_pat/es-fr')

وصف :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

الترخيص : CC BY 4.0
الإصدار : 1.1.0
الإنشقاقات :

ينقسم	أمثلة
`'train'`	32553

سمات :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "es",
            "fr"
        ],
        "id": null,
        "_type": "Translation"
    }
}

fr-ru

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:para_pat/fr-ru')

وصف :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

الترخيص : CC BY 4.0
الإصدار : 1.1.0
الإنشقاقات :

ينقسم	أمثلة
`'train'`	10889

سمات :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "fr",
            "ru"
        ],
        "id": null,
        "_type": "Translation"
    }
}

دي الاب

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:para_pat/de-fr')

وصف :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

الترخيص : CC BY 4.0
الإصدار : 1.1.0
الإنشقاقات :

ينقسم	أمثلة
`'train'`	1167988

سمات :

{
    "translation": {
        "languages": [
            "de",
            "fr"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-ja

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:para_pat/en-ja')

وصف :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

الترخيص : CC BY 4.0
الإصدار : 1.1.0
الإنشقاقات :

ينقسم	أمثلة
`'train'`	6170339

سمات :

{
    "translation": {
        "languages": [
            "en",
            "ja"
        ],
        "id": null,
        "_type": "Translation"
    }
}

ar-es

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:para_pat/en-es')

وصف :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

الترخيص : CC BY 4.0
الإصدار : 1.1.0
الإنشقاقات :

ينقسم	أمثلة
`'train'`	649396

سمات :

{
    "translation": {
        "languages": [
            "en",
            "es"
        ],
        "id": null,
        "_type": "Translation"
    }
}

ar-fr

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:para_pat/en-fr')

وصف :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

الترخيص : CC BY 4.0
الإصدار : 1.1.0
الإنشقاقات :

ينقسم	أمثلة
`'train'`	12223525

سمات :

{
    "translation": {
        "languages": [
            "en",
            "fr"
        ],
        "id": null,
        "_type": "Translation"
    }
}

دي أون

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:para_pat/de-en')

وصف :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

الترخيص : CC BY 4.0
الإصدار : 1.1.0
الإنشقاقات :

ينقسم	أمثلة
`'train'`	2165054

سمات :

{
    "translation": {
        "languages": [
            "de",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

أون-كو

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:para_pat/en-ko')

وصف :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

الترخيص : CC BY 4.0
الإصدار : 1.1.0
الإنشقاقات :

ينقسم	أمثلة
`'train'`	2324357

سمات :

{
    "translation": {
        "languages": [
            "en",
            "ko"
        ],
        "id": null,
        "_type": "Translation"
    }
}

fr-ja

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:para_pat/fr-ja')

وصف :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

الترخيص : CC BY 4.0
الإصدار : 1.1.0
الإنشقاقات :

ينقسم	أمثلة
`'train'`	313422

سمات :

{
    "translation": {
        "languages": [
            "fr",
            "ja"
        ],
        "id": null,
        "_type": "Translation"
    }
}

أون ز

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:para_pat/en-zh')

وصف :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

الترخيص : CC BY 4.0
الإصدار : 1.1.0
الإنشقاقات :

ينقسم	أمثلة
`'train'`	4897841

سمات :

{
    "translation": {
        "languages": [
            "en",
            "zh"
        ],
        "id": null,
        "_type": "Translation"
    }
}

ar-ru

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:para_pat/en-ru')

وصف :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

الترخيص : CC BY 4.0
الإصدار : 1.1.0
الإنشقاقات :

ينقسم	أمثلة
`'train'`	4296399

سمات :

{
    "translation": {
        "languages": [
            "en",
            "ru"
        ],
        "id": null,
        "_type": "Translation"
    }
}

fr-ko

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:para_pat/fr-ko')

وصف :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

الترخيص : CC BY 4.0
الإصدار : 1.1.0
الإنشقاقات :

ينقسم	أمثلة
`'train'`	120607

سمات :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "fr",
            "ko"
        ],
        "id": null,
        "_type": "Translation"
    }
}

ru-uk

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:para_pat/ru-uk')

وصف :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

الترخيص : CC BY 4.0
الإصدار : 1.1.0
الإنشقاقات :

ينقسم	أمثلة
`'train'`	85963

سمات :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "ru",
            "uk"
        ],
        "id": null,
        "_type": "Translation"
    }
}

أون-حزب العمال

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:para_pat/en-pt')

وصف :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

الترخيص : CC BY 4.0
الإصدار : 1.1.0
الإنشقاقات :

ينقسم	أمثلة
`'train'`	23121

سمات :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "en",
            "pt"
        ],
        "id": null,
        "_type": "Translation"
    }
}