TFDS obsługuje teraz format Croissant 🥐 ! Przeczytaj dokumentację , aby dowiedzieć się więcej.

Ta strona została przetłumaczona przez Cloud Translation API.

wiki_dpr

Referencje:

psgs_w100.nq.exact

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.nq.exact')

Opis :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

Licencja : Brak znanej licencji
Wersja : 0.0.0
Podziały :

Podział	Przykłady
`'train'`	21015300

Cechy :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.nq.skompresowany

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.nq.compressed')

Opis :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

Licencja : Brak znanej licencji
Wersja : 0.0.0
Podziały :

Podział	Przykłady
`'train'`	21015300

Cechy :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.nq.no_index

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.nq.no_index')

Opis :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

Licencja : Brak znanej licencji
Wersja : 0.0.0
Podziały :

Podział	Przykłady
`'train'`	21015300

Cechy :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.multiset.exact

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.multiset.exact')

Opis :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

Licencja : Brak znanej licencji
Wersja : 0.0.0
Podziały :

Podział	Przykłady
`'train'`	21015300

Cechy :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.multiset.compressed

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.multiset.compressed')

Opis :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

Licencja : Brak znanej licencji
Wersja : 0.0.0
Podziały :

Podział	Przykłady
`'train'`	21015300

Cechy :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.multiset.no_index

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.multiset.no_index')

Opis :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

Licencja : Brak znanej licencji
Wersja : 0.0.0
Podziały :

Podział	Przykłady
`'train'`	21015300

Cechy :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}