tiny_shakespeare
Stay organized with collections
Save and categorize content based on your preferences.
40,000 lines of Shakespeare from a variety of Shakespeare's plays. Featured in
Andrej Karpathy's blog post 'The Unreasonable Effectiveness of Recurrent Neural
Networks': http://karpathy.github.io/2015/05/21/rnn-effectiveness/
To use for e.g. character modelling:
d = tfds.load(name='tiny_shakespeare')['train']
d = d.map(lambda x: tf.strings.unicode_split(x['text'], 'UTF-8'))
# train split includes vocabulary for other splits
vocabulary = sorted(set(next(iter(d)).numpy()))
d = d.map(lambda x: {'cur_char': x[:-1], 'next_char': x[1:]})
d = d.unbatch()
seq_len = 100
batch_size = 2
d = d.batch(seq_len)
d = d.batch(batch_size)
Split |
Examples |
'test' |
1 |
'train' |
1 |
'validation' |
1 |
FeaturesDict({
'text': Text(shape=(), dtype=string),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
text |
Text |
|
string |
|
@misc{
author={Karpathy, Andrej},
title={char-rnn},
year={2015},
howpublished={\url{https://github.com/karpathy/char-rnn} }
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2023-02-11 UTC.
[null,null,["Last updated 2023-02-11 UTC."],[],[],null,["# tiny_shakespeare\n\n\u003cbr /\u003e\n\n- **Description**:\n\n40,000 lines of Shakespeare from a variety of Shakespeare's plays. Featured in\nAndrej Karpathy's blog post 'The Unreasonable Effectiveness of Recurrent Neural\nNetworks': \u003chttp://karpathy.github.io/2015/05/21/rnn-effectiveness/\u003e\n\nTo use for e.g. character modelling: \n\n d = tfds.load(name='tiny_shakespeare')['train']\n d = d.map(lambda x: tf.strings.unicode_split(x['text'], 'UTF-8'))\n # train split includes vocabulary for other splits\n vocabulary = sorted(set(next(iter(d)).numpy()))\n d = d.map(lambda x: {'cur_char': x[:-1], 'next_char': x[1:]})\n d = d.unbatch()\n seq_len = 100\n batch_size = 2\n d = d.batch(seq_len)\n d = d.batch(batch_size)\n\n- **Homepage** :\n \u003chttps://github.com/karpathy/char-rnn/blob/master/data/tinyshakespeare/input.txt\u003e\n\n- **Source code** :\n [`tfds.datasets.tiny_shakespeare.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/tiny_shakespeare/tiny_shakespeare_dataset_builder.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): No release notes.\n- **Download size** : `1.06 MiB`\n\n- **Dataset size** : `1.06 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 1 |\n| `'train'` | 1 |\n| `'validation'` | 1 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'text': Text(shape=(), dtype=string),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|---------|--------------|-------|--------|-------------|\n| | FeaturesDict | | | |\n| text | Text | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @misc{\n author={Karpathy, Andrej},\n title={char-rnn},\n year={2015},\n howpublished={\\url{https://github.com/karpathy/char-rnn} }\n }"]]