# Text generation with an RNN

This tutorial demonstrates how to generate text using a character-based RNN. You will work with a dataset of Shakespeare's writing from Andrej Karpathy's The Unreasonable Effectiveness of Recurrent Neural Networks. Given a sequence of characters from this data ("Shakespear"), train a model to predict the next character in the sequence ("e"). Longer sequences of text can be generated by calling the model repeatedly.

This tutorial includes runnable code implemented using tf.keras and eager execution. The following is sample output when the model in this tutorial trained for 30 epochs, and started with the string "Q":

QUEENE:
Thus by All bids the man against the word,
Which are so weak of care, by old care done;
And the precipitation through the bleeding throne.

BISHOP OF ELY:
Marry, and will, my lord, to weep in such a one were prettiest;
Yet now I was adopted heir
Of the world's lamentable day,
To watch the next way with his father with his face?

ESCALUS:
The cause why then we are all resolved more sons.

VOLUMNIA:
O, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, it is no sin it should be dead,
And love and pale as any will to that word.

QUEEN ELIZABETH:
But how long have I heard the soul for this world,
And show his hands of life be proved to stand.

PETRUCHIO:
I say he look'd on, if I must be content
To stay him from the fatal of our country's bliss.
His lordship pluck'd from this sentence then for prey,
And then let us twain, being the moon,
were she such a case as fills m


While some of the sentences are grammatical, most do not make sense. The model has not learned the meaning of words, but consider:

• The model is character-based. When training started, the model did not know how to spell an English word, or that words were even a unit of text.

• The structure of the output resembles a play—blocks of text generally begin with a speaker name, in all capital letters similar to the dataset.

• As demonstrated below, the model is trained on small batches of text (100 characters each), and is still able to generate a longer sequence of text with coherent structure.

## Setup

### Import TensorFlow and other libraries

import tensorflow as tf

import numpy as np
import os
import time


Change the following line to run this code on your own data.

path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt
1122304/1115394 [==============================] - 0s 0us/step



First, look in the text:

# Read, then decode for py2 compat.
# length of text is the number of characters in it
print('Length of text: {} characters'.format(len(text)))

Length of text: 1115394 characters


# Take a look at the first 250 characters in text
print(text[:250])

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.


# The unique characters in the file
vocab = sorted(set(text))
print('{} unique characters'.format(len(vocab)))

65 unique characters



## Process the text

### Vectorize the text

Before training, you need to map strings to a numerical representation. Create two lookup tables: one mapping characters to numbers, and another for numbers to characters.

# Creating a mapping from unique characters to indices
char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)

text_as_int = np.array([char2idx[c] for c in text])


Now you have an integer representation for each character. Notice that you mapped the character as indexes from 0 to len(unique).

print('{')
for char,_ in zip(char2idx, range(20)):
print('  {:4s}: {:3d},'.format(repr(char), char2idx[char]))
print('  ...\n}')

{
'\n':   0,
' ' :   1,
'!' :   2,



## Train the model

At this point the problem can be treated as a standard classification problem. Given the previous RNN state, and the input this time step, predict the class of the next character.

### Attach an optimizer, and a loss function

The standard tf.keras.losses.sparse_categorical_crossentropy loss function works in this case because it is applied across the last dimension of the predictions.

Because your model returns logits, you need to set the from_logits flag.

def loss(labels, logits):
return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

example_batch_loss = loss(target_example_batch, example_batch_predictions)
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")
print("scalar_loss:      ", example_batch_loss.numpy().mean())

Prediction shape:  (64, 100, 65)  # (batch_size, sequence_length, vocab_size)
scalar_loss:       4.174373



Configure the training procedure using the tf.keras.Model.compile method. Use tf.keras.optimizers.Adam with default arguments and the loss function.

model.compile(optimizer='adam', loss=loss)


### Configure checkpoints

Use a tf.keras.callbacks.ModelCheckpoint to ensure that checkpoints are saved during training:

# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_prefix,
save_weights_only=True)


### Execute the training

To keep training time reasonable, use 10 epochs to train the model. In Colab, set the runtime to GPU for faster training.

EPOCHS = 10

history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])

Epoch 1/10
172/172 [==============================] - 5s 27ms/step - loss: 2.6807
Epoch 2/10
172/172 [==============================] - 5s 27ms/step - loss: 1.9748
Epoch 3/10
172/172 [==============================] - 5s 26ms/step - loss: 1.7063
Epoch 4/10
172/172 [==============================] - 5s 26ms/step - loss: 1.5543
Epoch 5/10
172/172 [==============================] - 5s 27ms/step - loss: 1.4633
Epoch 6/10
172/172 [==============================] - 5s 26ms/step - loss: 1.4028
Epoch 7/10
172/172 [==============================] - 5s 26ms/step - loss: 1.3568
Epoch 8/10
172/172 [==============================] - 5s 26ms/step - loss: 1.3187
Epoch 9/10
172/172 [==============================] - 5s 26ms/step - loss: 1.2845
Epoch 10/10
172/172 [==============================] - 5s 26ms/step - loss: 1.2528



## Generate text

### Restore the latest checkpoint

To keep this prediction step simple, use a batch size of 1.

Because of the way the RNN state is passed from timestep to timestep, the model only accepts a fixed batch size once built.

To run the model with a different batch_size, you need to rebuild the model and restore the weights from the checkpoint.

tf.train.latest_checkpoint(checkpoint_dir)

'./training_checkpoints/ckpt_10'

model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)

model.build(tf.TensorShape([1, None]))

model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
embedding_1 (Embedding)      (1, None, 256)            16640
_________________________________________________________________
gru_1 (GRU)                  (1, None, 1024)           3938304
_________________________________________________________________
dense_1 (Dense)              (1, None, 65)             66625
=================================================================
Total params: 4,021,569
Trainable params: 4,021,569
Non-trainable params: 0
_________________________________________________________________



### The prediction loop

The following code block generates the text:

• Begin by choosing a start string, initializing the RNN state and setting the number of characters to generate.

• Get the prediction distribution of the next character using the start string and the RNN state.

• Then, use a categorical distribution to calculate the index of the predicted character. Use this predicted character as our next input to the model.

• The RNN state returned by the model is fed back into the model so that it now has more context, instead of only one character. After predicting the next character, the modified RNN states are again fed back into the model, which is how it learns as it gets more context from the previously predicted characters.

Looking at the generated text, you'll see the model knows when to capitalize, make paragraphs and imitates a Shakespeare-like writing vocabulary. With the small number of training epochs, it has not yet learned to form coherent sentences.

def generate_text(model, start_string):
# Evaluation step (generating text using the learned model)

# Number of characters to generate
num_generate = 1000

# Converting our start string to numbers (vectorizing)
input_eval = [char2idx[s] for s in start_string]
input_eval = tf.expand_dims(input_eval, 0)

# Empty string to store our results
text_generated = []

# Low temperature results in more predictable text.
# Higher temperature results in more surprising text.
# Experiment to find the best setting.
temperature = 1.0

# Here batch size == 1
model.reset_states()
for i in range(num_generate):
predictions = model(input_eval)
# remove the batch dimension
predictions = tf.squeeze(predictions, 0)

# using a categorical distribution to predict the character returned by the model
predictions = predictions / temperature
predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

# Pass the predicted character as the next input to the model
# along with the previous hidden state
input_eval = tf.expand_dims([predicted_id], 0)

text_generated.append(idx2char[predicted_id])

return (start_string + ''.join(text_generated))

print(generate_text(model, start_string=u"ROMEO: "))

ROMEO: ghast I cut go,
Know the normander and the wrong:
To our Morsuis misdress are behiod;
And after as if no other husion.

VALERIS:

WARWICK:
Then, atient the bade, truckle aid,
Dearve your tongue should be cred to our face,
Bear trouble my father valiant,' in the company.

SICINIUS:
O God!'Sir afeard?

MIRANDA:
Come, good med,---or whom by the duke?

DUKE VINCENTIO:
Yes, that are bore indocation!

IO:
None not, my lord's sons.

MIRANDA:
Of some King?'
And, if thou was, a partanot young to thee.

JULIET:
O, tell; then I'll see them again? There's not so reder
no mother, and my three here to us. You might shall not speak, these this
same this within; what armpy I might
but though some way.

ROMEO:
Our daughter of the fool, that great come.
So, not the sun summer so all the sends,
Your ludgers made before the souls of years, and thereby there. Lady, father, were well the sold, pass, remeate.

Second King Richard's daughter,
Which chee



The easiest thing you can do to improve the results is to train it for longer (try EPOCHS = 30).

You can also experiment with a different start string, try adding another RNN layer to improve the model's accuracy, or adjust the temperature parameter to generate more or less random predictions.

The above training procedure is simple, but does not give you much control.

So now that you've seen how to run the model manually let's unpack the training loop, and implement it ourselves. This gives a starting point if, for example, you want to implement curriculum learning to help stabilize the model's open-loop output.

Use tf.GradientTape to track the gradients. You can learn more about this approach by reading the eager execution guide.

The procedure works as follows:

• First, reset the RNN state. You do this by calling the tf.keras.Model.reset_states method.

• Next, iterate over the dataset (batch by batch) and calculate the predictions associated with each.

• Open a tf.GradientTape, and calculate the predictions and loss in that context.

• Calculate the gradients of the loss with respect to the model variables using the tf.GradientTape.grads method.

• Finally, take a step downwards by using the optimizer's tf.train.Optimizer.apply_gradients method.

model = build_model(
vocab_size=len(vocab),
embedding_dim=embedding_dim,
rnn_units=rnn_units,
batch_size=BATCH_SIZE)

WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.iter
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.beta_1
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.beta_2
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.decay
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.learning_rate
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-0.embeddings
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-2.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-2.bias
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-1.cell.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-1.cell.recurrent_kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-1.cell.bias
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-0.embeddings
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-2.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-2.bias
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-1.cell.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-1.cell.recurrent_kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-1.cell.bias
WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.


optimizer = tf.keras.optimizers.Adam()

@tf.function
def train_step(inp, target):
predictions = model(inp)
loss = tf.reduce_mean(
tf.keras.losses.sparse_categorical_crossentropy(
target, predictions, from_logits=True))

return loss

# Training step
EPOCHS = 10

for epoch in range(EPOCHS):
start = time.time()

# resetting the hidden state at the start of every epoch
model.reset_states()

for (batch_n, (inp, target)) in enumerate(dataset):
loss = train_step(inp, target)

if batch_n % 100 == 0:
template = 'Epoch {} Batch {} Loss {}'
print(template.format(epoch + 1, batch_n, loss))

# saving (checkpoint) the model every 5 epochs
if (epoch + 1) % 5 == 0:
model.save_weights(checkpoint_prefix.format(epoch=epoch))

print('Epoch {} Loss {:.4f}'.format(epoch + 1, loss))
print('Time taken for 1 epoch {} sec\n'.format(time.time() - start))

model.save_weights(checkpoint_prefix.format(epoch=epoch))

Epoch 1 Batch 0 Loss 4.174976348876953
Epoch 1 Batch 100 Loss 2.351067304611206
Epoch 1 Loss 2.1421
Time taken for 1 epoch 6.3171796798706055 sec

Epoch 2 Batch 0 Loss 2.166642665863037
Epoch 2 Batch 100 Loss 1.9492360353469849
Epoch 2 Loss 1.7901
Time taken for 1 epoch 5.3413612842559814 sec

Epoch 3 Batch 0 Loss 1.804692029953003
Epoch 3 Batch 100 Loss 1.6545528173446655
Epoch 3 Loss 1.6328
Time taken for 1 epoch 5.337632179260254 sec

Epoch 4 Batch 0 Loss 1.6188888549804688
Epoch 4 Batch 100 Loss 1.5314372777938843
Epoch 4 Loss 1.5319
Time taken for 1 epoch 5.2844321727752686 sec

Epoch 5 Batch 0 Loss 1.470827579498291
Epoch 5 Batch 100 Loss 1.4400928020477295
Epoch 5 Loss 1.4442
Time taken for 1 epoch 5.46646785736084 sec

Epoch 6 Batch 0 Loss 1.4113285541534424
Epoch 6 Batch 100 Loss 1.387071132659912
Epoch 6 Loss 1.3713
Time taken for 1 epoch 5.243147373199463 sec

Epoch 7 Batch 0 Loss 1.3486154079437256
Epoch 7 Batch 100 Loss 1.353363037109375
Epoch 7 Loss 1.3270
Time taken for 1 epoch 5.295132160186768 sec

Epoch 8 Batch 0 Loss 1.2960264682769775
Epoch 8 Batch 100 Loss 1.3038402795791626
Epoch 8 Loss 1.3556
Time taken for 1 epoch 5.228798151016235 sec

Epoch 9 Batch 0 Loss 1.2495232820510864
Epoch 9 Batch 100 Loss 1.30863618850708
Epoch 9 Loss 1.2699
Time taken for 1 epoch 5.33559775352478 sec

Epoch 10 Batch 0 Loss 1.2161246538162231
Epoch 10 Batch 100 Loss 1.2242770195007324
Epoch 10 Loss 1.2360
Time taken for 1 epoch 5.377742528915405 sec