wiki_atomic_edits

مراجع:

German_insertions

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/german_insertions')
  • وصف :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • الترخيص : لا يوجد ترخيص معروف
  • الإصدار : 1.0.0
  • الإنشقاقات :
ينقسم أمثلة
'train' 3343403
  • سمات :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

German_deletions

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/german_deletions')
  • وصف :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • الترخيص : لا يوجد ترخيص معروف
  • الإصدار : 1.0.0
  • الإنشقاقات :
ينقسم أمثلة
'train' 1994329
  • سمات :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

english_insertions

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/english_insertions')
  • وصف :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • الترخيص : لا يوجد ترخيص معروف
  • الإصدار : 1.0.0
  • الإنشقاقات :
ينقسم أمثلة
'train' 13737796
  • سمات :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

english_deletions

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/english_deletions')
  • وصف :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • الترخيص : لا يوجد ترخيص معروف
  • الإصدار : 1.0.0
  • الإنشقاقات :
ينقسم أمثلة
'train' 9352389
  • سمات :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

Spanish_insertions

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/spanish_insertions')
  • وصف :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • الترخيص : لا يوجد ترخيص معروف
  • الإصدار : 1.0.0
  • الإنشقاقات :
ينقسم أمثلة
'train' 1380934
  • سمات :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

Spanish_deletions

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/spanish_deletions')
  • وصف :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • الترخيص : لا يوجد ترخيص معروف
  • الإصدار : 1.0.0
  • الإنشقاقات :
ينقسم أمثلة
'train' 908276
  • سمات :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

French_insertions

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/french_insertions')
  • وصف :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • الترخيص : لا يوجد ترخيص معروف
  • الإصدار : 1.0.0
  • الإنشقاقات :
ينقسم أمثلة
'train' 2038305
  • سمات :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

French_deletions

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/french_deletions')
  • وصف :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • الترخيص : لا يوجد ترخيص معروف
  • الإصدار : 1.0.0
  • الإنشقاقات :
ينقسم أمثلة
'train' 2060242
  • سمات :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

italian_insertions

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/italian_insertions')
  • وصف :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • الترخيص : لا يوجد ترخيص معروف
  • الإصدار : 1.0.0
  • الإنشقاقات :
ينقسم أمثلة
'train' 1078814
  • سمات :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

italian_deletions

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/italian_deletions')
  • وصف :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • الترخيص : لا يوجد ترخيص معروف
  • الإصدار : 1.0.0
  • الإنشقاقات :
ينقسم أمثلة
'train' 583316
  • سمات :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

japanese_insertions

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/japanese_insertions')
  • وصف :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • الترخيص : لا يوجد ترخيص معروف
  • الإصدار : 1.0.0
  • الإنشقاقات :
ينقسم أمثلة
'train' 2249527
  • سمات :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

japanese_deletions

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/japanese_deletions')
  • وصف :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • الترخيص : لا يوجد ترخيص معروف
  • الإصدار : 1.0.0
  • الإنشقاقات :
ينقسم أمثلة
'train' 1352162
  • سمات :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

russian_insertions

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/russian_insertions')
  • وصف :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • الترخيص : لا يوجد ترخيص معروف
  • الإصدار : 1.0.0
  • الإنشقاقات :
ينقسم أمثلة
'train' 1471638
  • سمات :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

russian_deletions

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/russian_deletions')
  • وصف :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • الترخيص : لا يوجد ترخيص معروف
  • الإصدار : 1.0.0
  • الإنشقاقات :
ينقسم أمثلة
'train' 960976
  • سمات :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

chinese_insertions

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/chinese_insertions')
  • وصف :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • الترخيص : لا يوجد ترخيص معروف
  • الإصدار : 1.0.0
  • الإنشقاقات :
ينقسم أمثلة
'train' 746509
  • سمات :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

chinese_deletions

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/chinese_deletions')
  • وصف :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • الترخيص : لا يوجد ترخيص معروف
  • الإصدار : 1.0.0
  • الإنشقاقات :
ينقسم أمثلة
'train' 467271
  • سمات :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}