TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

wikitext_tl39

References:

wikitext-tl-39

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:wikitext_tl39/wikitext-tl-39')

Description:

Large scale, unlabeled text dataset with 39 Million tokens in the training set. Inspired by the original WikiText Long Term Dependency dataset (Merity et al., 2016). TL means "Tagalog." Originally published in Cruz & Cheng (2019).

License: GPL-3.0
Version: 1.0.0
Splits:

Split	Examples
`'test'`	376737
`'train'`	1766072
`'validation'`	381763

Features:

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2022-06-28 UTC.