wiki_atomic_edits

הפניות:

גרמן_תוספות

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/german_insertions')
  • תיאור :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • רישיון : אין רישיון ידוע
  • גרסה : 1.0.0
  • פיצולים :
לְפַצֵל דוגמאות
'train' 3343403
  • מאפיינים :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

מחיקות_גרמן

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/german_deletions')
  • תיאור :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • רישיון : אין רישיון ידוע
  • גרסה : 1.0.0
  • פיצולים :
לְפַצֵל דוגמאות
'train' 1994329
  • מאפיינים :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

הוספת_אנגלית

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/english_insertions')
  • תיאור :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • רישיון : אין רישיון ידוע
  • גרסה : 1.0.0
  • פיצולים :
לְפַצֵל דוגמאות
'train' 13737796
  • מאפיינים :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

אנגלית_מחיקות

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/english_deletions')
  • תיאור :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • רישיון : אין רישיון ידוע
  • גרסה : 1.0.0
  • פיצולים :
לְפַצֵל דוגמאות
'train' 9352389
  • מאפיינים :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ספרדית_הוספות

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/spanish_insertions')
  • תיאור :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • רישיון : אין רישיון ידוע
  • גרסה : 1.0.0
  • פיצולים :
לְפַצֵל דוגמאות
'train' 1380934
  • מאפיינים :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

מחיקות_spanish

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/spanish_deletions')
  • תיאור :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • רישיון : אין רישיון ידוע
  • גרסה : 1.0.0
  • פיצולים :
לְפַצֵל דוגמאות
'train' 908276
  • מאפיינים :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

הוספת_צרפתית

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/french_insertions')
  • תיאור :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • רישיון : אין רישיון ידוע
  • גרסה : 1.0.0
  • פיצולים :
לְפַצֵל דוגמאות
'train' 2038305
  • מאפיינים :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

מחיקות_צרפתיות

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/french_deletions')
  • תיאור :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • רישיון : אין רישיון ידוע
  • גרסה : 1.0.0
  • פיצולים :
לְפַצֵל דוגמאות
'train' 2060242
  • מאפיינים :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

כניסות_איטלקיות

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/italian_insertions')
  • תיאור :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • רישיון : אין רישיון ידוע
  • גרסה : 1.0.0
  • פיצולים :
לְפַצֵל דוגמאות
'train' 1078814
  • מאפיינים :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

מחיקות_איטלקיות

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/italian_deletions')
  • תיאור :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • רישיון : אין רישיון ידוע
  • גרסה : 1.0.0
  • פיצולים :
לְפַצֵל דוגמאות
'train' 583316
  • מאפיינים :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

כניסות_יפניות

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/japanese_insertions')
  • תיאור :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • רישיון : אין רישיון ידוע
  • גרסה : 1.0.0
  • פיצולים :
לְפַצֵל דוגמאות
'train' 2249527
  • מאפיינים :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

מחיקות_יפניות

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/japanese_deletions')
  • תיאור :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • רישיון : אין רישיון ידוע
  • גרסה : 1.0.0
  • פיצולים :
לְפַצֵל דוגמאות
'train' 1352162
  • מאפיינים :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

הוספות_רוסיות

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/russian_insertions')
  • תיאור :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • רישיון : אין רישיון ידוע
  • גרסה : 1.0.0
  • פיצולים :
לְפַצֵל דוגמאות
'train' 1471638
  • מאפיינים :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

מחיקות_רוסיות

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/russian_deletions')
  • תיאור :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • רישיון : אין רישיון ידוע
  • גרסה : 1.0.0
  • פיצולים :
לְפַצֵל דוגמאות
'train' 960976
  • מאפיינים :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

סיניות_הוספות

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/chinese_insertions')
  • תיאור :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • רישיון : אין רישיון ידוע
  • גרסה : 1.0.0
  • פיצולים :
לְפַצֵל דוגמאות
'train' 746509
  • מאפיינים :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

סינית_מחיקות

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:wiki_atomic_edits/chinese_deletions')
  • תיאור :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • רישיון : אין רישיון ידוע
  • גרסה : 1.0.0
  • פיצולים :
לְפַצֵל דוגמאות
'train' 467271
  • מאפיינים :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}