Conozca lo último en aprendizaje automático, IA generativa y más en el Simposio WiML 2023.

Se usó la API de Cloud Translation para traducir esta página.

miles de millones de palabras en español

Referencias:

cuerpo

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:spanish_billion_words/corpus')

Descripción :

An unannotated Spanish corpus of nearly 1.5 billion words, compiled from different resources from the web.
This resources include the spanish portions of SenSem, the Ancora Corpus, some OPUS Project Corpora and the Europarl,
the Tibidabo Treebank, the IULA Spanish LSP Treebank, and dumps from the Spanish Wikipedia, Wikisource and Wikibooks.
This corpus is a compilation of 100 text files. Each line of these files represents one of the 50 million sentences from the corpus.

Licencia : https://creativecommons.org/licenses/by-sa/4.0/
Versión : 1.1.0
Divisiones :

Separar	Ejemplos
`'train'`	46925295

Características :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}