TFDS supporte désormais le format Croissant 🥐 ! Lisez la documentation pour en savoir plus.

Cette page a été traduite par l'API Cloud Translation.

wiki_dpr

Références :

psgs_w100.nq.exact

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.nq.exact')

Description :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

Licence : Aucune licence connue
Version : 0.0.0
Divisions :

Diviser	Exemples
`'train'`	21015300

Caractéristiques :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.nq.compressé

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.nq.compressed')

Description :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

Licence : Aucune licence connue
Version : 0.0.0
Divisions :

Diviser	Exemples
`'train'`	21015300

Caractéristiques :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.nq.no_index

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.nq.no_index')

Description :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

Licence : Aucune licence connue
Version : 0.0.0
Divisions :

Diviser	Exemples
`'train'`	21015300

Caractéristiques :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.multiset.exact

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.multiset.exact')

Description :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

Licence : Aucune licence connue
Version : 0.0.0
Divisions :

Diviser	Exemples
`'train'`	21015300

Caractéristiques :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.multiset.compressé

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.multiset.compressed')

Description :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

Licence : Aucune licence connue
Version : 0.0.0
Divisions :

Diviser	Exemples
`'train'`	21015300

Caractéristiques :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.multiset.no_index

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.multiset.no_index')

Description :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

Licence : Aucune licence connue
Version : 0.0.0
Divisions :

Diviser	Exemples
`'train'`	21015300

Caractéristiques :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}