Lihat di TensorFlow.org | Jalankan di Google Colab | Lihat sumber di GitHub | Unduh buku catatan |
Mempersiapkan
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
pengantar
Keras menyediakan pelatihan default dan loop evaluasi, fit()
dan evaluate()
. Penggunaannya tercakup dalam panduan Training & evaluasi dengan built-in metode .
Jika Anda ingin menyesuaikan algoritma pembelajaran model Anda saat masih memanfaatkan kenyamanan fit()
(misalnya, untuk melatih GAN menggunakan fit()
), Anda dapat subclass Model
kelas dan melaksanakan sendiri train_step()
metode, yang disebut berulang kali selama fit()
. Hal ini tercakup dalam panduan Menyesuaikan apa yang terjadi di fit()
.
Sekarang, jika Anda ingin kontrol tingkat yang sangat rendah atas pelatihan & evaluasi, Anda harus menulis loop pelatihan & evaluasi Anda sendiri dari awal. Inilah yang dimaksud dengan panduan ini.
Menggunakan GradientTape
: contoh end-to-end pertama
Memanggil model di dalam GradientTape
lingkup memungkinkan Anda untuk mengambil gradien dari bobot dilatih lapisan sehubungan dengan nilai kerugian. Menggunakan contoh optimizer, Anda dapat menggunakan gradien ini untuk memperbarui variabel-variabel ini (yang dapat Anda mengambil menggunakan model.trainable_weights
).
Mari kita pertimbangkan model MNIST sederhana:
inputs = keras.Input(shape=(784,), name="digits")
x1 = layers.Dense(64, activation="relu")(inputs)
x2 = layers.Dense(64, activation="relu")(x1)
outputs = layers.Dense(10, name="predictions")(x2)
model = keras.Model(inputs=inputs, outputs=outputs)
Mari kita latih menggunakan gradien mini-batch dengan loop pelatihan khusus.
Pertama, kita akan membutuhkan pengoptimal, fungsi kerugian, dan kumpulan data:
# Instantiate an optimizer.
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
# Prepare the training dataset.
batch_size = 64
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = np.reshape(x_train, (-1, 784))
x_test = np.reshape(x_test, (-1, 784))
# Reserve 10,000 samples for validation.
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]
# Prepare the training dataset.
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)
# Prepare the validation dataset.
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_dataset = val_dataset.batch(batch_size)
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz 11493376/11490434 [==============================] - 1s 0us/step 11501568/11490434 [==============================] - 1s 0us/step
Inilah lingkaran pelatihan kami:
- Kami membuka
for
loop yang iterates atas zaman - Untuk setiap zaman, kami membuka
for
loop yang iterates atas dataset, dalam batch - Untuk setiap batch, kami membuka
GradientTape()
lingkup - Di dalam ruang lingkup ini, kami memanggil model (pass depan) dan menghitung kerugiannya
- Di luar ruang lingkup, kami mengambil gradien bobot model sehubungan dengan kerugian
- Terakhir, kami menggunakan pengoptimal untuk memperbarui bobot model berdasarkan gradien
epochs = 2
for epoch in range(epochs):
print("\nStart of epoch %d" % (epoch,))
# Iterate over the batches of the dataset.
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
# Open a GradientTape to record the operations run
# during the forward pass, which enables auto-differentiation.
with tf.GradientTape() as tape:
# Run the forward pass of the layer.
# The operations that the layer applies
# to its inputs are going to be recorded
# on the GradientTape.
logits = model(x_batch_train, training=True) # Logits for this minibatch
# Compute the loss value for this minibatch.
loss_value = loss_fn(y_batch_train, logits)
# Use the gradient tape to automatically retrieve
# the gradients of the trainable variables with respect to the loss.
grads = tape.gradient(loss_value, model.trainable_weights)
# Run one step of gradient descent by updating
# the value of the variables to minimize the loss.
optimizer.apply_gradients(zip(grads, model.trainable_weights))
# Log every 200 batches.
if step % 200 == 0:
print(
"Training loss (for one batch) at step %d: %.4f"
% (step, float(loss_value))
)
print("Seen so far: %s samples" % ((step + 1) * batch_size))
Start of epoch 0 Training loss (for one batch) at step 0: 68.7478 Seen so far: 64 samples Training loss (for one batch) at step 200: 1.9448 Seen so far: 12864 samples Training loss (for one batch) at step 400: 1.1859 Seen so far: 25664 samples Training loss (for one batch) at step 600: 0.6914 Seen so far: 38464 samples Start of epoch 1 Training loss (for one batch) at step 0: 0.9113 Seen so far: 64 samples Training loss (for one batch) at step 200: 0.9550 Seen so far: 12864 samples Training loss (for one batch) at step 400: 0.5139 Seen so far: 25664 samples Training loss (for one batch) at step 600: 0.7227 Seen so far: 38464 samples
Penanganan metrik tingkat rendah
Mari tambahkan pemantauan metrik ke loop dasar ini.
Anda dapat dengan mudah menggunakan kembali metrik bawaan (atau metrik khusus yang Anda tulis) dalam loop pelatihan yang ditulis dari awal. Berikut alurnya:
- Buat instance metrik di awal loop
- Sebut
metric.update_state()
setelah setiap batch - Sebut
metric.result()
ketika Anda perlu untuk menampilkan nilai saat ini dari metrik - Sebut
metric.reset_states()
ketika Anda harus membersihkan negara dari metrik (biasanya pada akhir sebuah zaman)
Mari kita menggunakan pengetahuan ini untuk menghitung SparseCategoricalAccuracy
data validasi di akhir setiap zaman:
# Get model
inputs = keras.Input(shape=(784,), name="digits")
x = layers.Dense(64, activation="relu", name="dense_1")(inputs)
x = layers.Dense(64, activation="relu", name="dense_2")(x)
outputs = layers.Dense(10, name="predictions")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
# Instantiate an optimizer to train the model.
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
# Prepare the metrics.
train_acc_metric = keras.metrics.SparseCategoricalAccuracy()
val_acc_metric = keras.metrics.SparseCategoricalAccuracy()
Inilah lingkaran pelatihan & evaluasi kami:
import time
epochs = 2
for epoch in range(epochs):
print("\nStart of epoch %d" % (epoch,))
start_time = time.time()
# Iterate over the batches of the dataset.
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
with tf.GradientTape() as tape:
logits = model(x_batch_train, training=True)
loss_value = loss_fn(y_batch_train, logits)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
# Update training metric.
train_acc_metric.update_state(y_batch_train, logits)
# Log every 200 batches.
if step % 200 == 0:
print(
"Training loss (for one batch) at step %d: %.4f"
% (step, float(loss_value))
)
print("Seen so far: %d samples" % ((step + 1) * batch_size))
# Display metrics at the end of each epoch.
train_acc = train_acc_metric.result()
print("Training acc over epoch: %.4f" % (float(train_acc),))
# Reset training metrics at the end of each epoch
train_acc_metric.reset_states()
# Run a validation loop at the end of each epoch.
for x_batch_val, y_batch_val in val_dataset:
val_logits = model(x_batch_val, training=False)
# Update val metrics
val_acc_metric.update_state(y_batch_val, val_logits)
val_acc = val_acc_metric.result()
val_acc_metric.reset_states()
print("Validation acc: %.4f" % (float(val_acc),))
print("Time taken: %.2fs" % (time.time() - start_time))
Start of epoch 0 Training loss (for one batch) at step 0: 88.9958 Seen so far: 64 samples Training loss (for one batch) at step 200: 2.2214 Seen so far: 12864 samples Training loss (for one batch) at step 400: 1.3083 Seen so far: 25664 samples Training loss (for one batch) at step 600: 0.8282 Seen so far: 38464 samples Training acc over epoch: 0.7406 Validation acc: 0.8201 Time taken: 6.31s Start of epoch 1 Training loss (for one batch) at step 0: 0.3276 Seen so far: 64 samples Training loss (for one batch) at step 200: 0.4819 Seen so far: 12864 samples Training loss (for one batch) at step 400: 0.5971 Seen so far: 25664 samples Training loss (for one batch) at step 600: 0.5862 Seen so far: 38464 samples Training acc over epoch: 0.8474 Validation acc: 0.8676 Time taken: 5.98s
Mempercepat-up langkah pelatihan Anda dengan tf.function
Runtime default dalam TensorFlow 2 adalah eksekusi bersemangat . Dengan demikian, loop pelatihan kami di atas dijalankan dengan penuh semangat.
Ini bagus untuk debugging, tetapi kompilasi grafik memiliki keunggulan kinerja yang pasti. Menggambarkan komputasi Anda sebagai grafik statis memungkinkan kerangka kerja untuk menerapkan pengoptimalan kinerja global. Ini tidak mungkin ketika kerangka kerja dibatasi untuk dengan rakus mengeksekusi satu demi satu operasi, tanpa pengetahuan tentang apa yang akan terjadi selanjutnya.
Anda dapat mengkompilasi menjadi grafik statis fungsi apa pun yang menggunakan tensor sebagai input. Hanya menambahkan @tf.function
dekorator di atasnya, seperti ini:
@tf.function
def train_step(x, y):
with tf.GradientTape() as tape:
logits = model(x, training=True)
loss_value = loss_fn(y, logits)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
train_acc_metric.update_state(y, logits)
return loss_value
Mari kita lakukan hal yang sama dengan langkah evaluasi:
@tf.function
def test_step(x, y):
val_logits = model(x, training=False)
val_acc_metric.update_state(y, val_logits)
Sekarang, mari kita jalankan kembali loop pelatihan kita dengan langkah pelatihan yang dikompilasi ini:
import time
epochs = 2
for epoch in range(epochs):
print("\nStart of epoch %d" % (epoch,))
start_time = time.time()
# Iterate over the batches of the dataset.
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
loss_value = train_step(x_batch_train, y_batch_train)
# Log every 200 batches.
if step % 200 == 0:
print(
"Training loss (for one batch) at step %d: %.4f"
% (step, float(loss_value))
)
print("Seen so far: %d samples" % ((step + 1) * batch_size))
# Display metrics at the end of each epoch.
train_acc = train_acc_metric.result()
print("Training acc over epoch: %.4f" % (float(train_acc),))
# Reset training metrics at the end of each epoch
train_acc_metric.reset_states()
# Run a validation loop at the end of each epoch.
for x_batch_val, y_batch_val in val_dataset:
test_step(x_batch_val, y_batch_val)
val_acc = val_acc_metric.result()
val_acc_metric.reset_states()
print("Validation acc: %.4f" % (float(val_acc),))
print("Time taken: %.2fs" % (time.time() - start_time))
Start of epoch 0 Training loss (for one batch) at step 0: 0.7921 Seen so far: 64 samples Training loss (for one batch) at step 200: 0.7755 Seen so far: 12864 samples Training loss (for one batch) at step 400: 0.1564 Seen so far: 25664 samples Training loss (for one batch) at step 600: 0.3181 Seen so far: 38464 samples Training acc over epoch: 0.8788 Validation acc: 0.8866 Time taken: 1.59s Start of epoch 1 Training loss (for one batch) at step 0: 0.5222 Seen so far: 64 samples Training loss (for one batch) at step 200: 0.4574 Seen so far: 12864 samples Training loss (for one batch) at step 400: 0.4035 Seen so far: 25664 samples Training loss (for one batch) at step 600: 0.7561 Seen so far: 38464 samples Training acc over epoch: 0.8959 Validation acc: 0.9028 Time taken: 1.27s
Jauh lebih cepat, bukan?
Penanganan kerugian tingkat rendah yang dilacak oleh model
Lapisan & model rekursif melacak kerugian yang diciptakan selama maju lewat lapisan panggilan self.add_loss(value)
. Daftar yang dihasilkan dari nilai kerugian skalar yang tersedia melalui properti model.losses
pada akhir depan lulus.
Jika Anda ingin menggunakan komponen kerugian ini, Anda harus menjumlahkannya dan menambahkannya ke kerugian utama dalam langkah pelatihan Anda.
Pertimbangkan lapisan ini, yang membuat hilangnya regularisasi aktivitas:
class ActivityRegularizationLayer(layers.Layer):
def call(self, inputs):
self.add_loss(1e-2 * tf.reduce_sum(inputs))
return inputs
Mari kita buat model yang sangat sederhana yang menggunakannya:
inputs = keras.Input(shape=(784,), name="digits")
x = layers.Dense(64, activation="relu")(inputs)
# Insert activity regularization as a layer
x = ActivityRegularizationLayer()(x)
x = layers.Dense(64, activation="relu")(x)
outputs = layers.Dense(10, name="predictions")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
Inilah yang seharusnya terlihat seperti langkah pelatihan kita sekarang:
@tf.function
def train_step(x, y):
with tf.GradientTape() as tape:
logits = model(x, training=True)
loss_value = loss_fn(y, logits)
# Add any extra losses created during the forward pass.
loss_value += sum(model.losses)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
train_acc_metric.update_state(y, logits)
return loss_value
Ringkasan
Sekarang Anda tahu semua yang perlu diketahui tentang menggunakan loop pelatihan bawaan dan menulis sendiri dari awal.
Sebagai penutup, inilah contoh ujung ke ujung sederhana yang menyatukan semua yang telah Anda pelajari dalam panduan ini: DCGAN yang dilatih tentang digit MNIST.
Contoh ujung ke ujung: loop pelatihan GAN dari awal
Anda mungkin akrab dengan Generative Adversarial Networks (GANs). GAN dapat menghasilkan gambar baru yang terlihat hampir nyata, dengan mempelajari distribusi laten dari kumpulan data pelatihan gambar ("ruang laten" gambar).
GAN terdiri dari dua bagian: model "generator" yang memetakan titik di ruang laten ke titik di ruang gambar, model "diskriminator", pengklasifikasi yang dapat membedakan antara gambar asli (dari dataset pelatihan) dan palsu gambar (output dari jaringan generator).
Loop pelatihan GAN terlihat seperti ini:
1) Latih diskriminator. - Sampel batch titik acak di ruang laten. - Ubah poin menjadi gambar palsu melalui model "generator". - Dapatkan sekumpulan gambar nyata dan gabungkan dengan gambar yang dihasilkan. - Latih model "diskriminator" untuk mengklasifikasikan gambar yang dihasilkan vs. nyata.
2) Melatih generator. - Sampel titik acak di ruang laten. - Ubah poin menjadi gambar palsu melalui jaringan "generator". - Dapatkan sekumpulan gambar nyata dan gabungkan dengan gambar yang dihasilkan. - Latih model "generator" untuk "membodohi" diskriminator dan mengklasifikasikan gambar palsu sebagai nyata.
Untuk gambaran yang lebih rinci tentang bagaimana Gans karya, lihat Jauh Belajar dengan Python .
Mari kita terapkan loop pelatihan ini. Pertama, buat diskriminator yang dimaksudkan untuk mengklasifikasikan digit palsu vs asli:
discriminator = keras.Sequential(
[
keras.Input(shape=(28, 28, 1)),
layers.Conv2D(64, (3, 3), strides=(2, 2), padding="same"),
layers.LeakyReLU(alpha=0.2),
layers.Conv2D(128, (3, 3), strides=(2, 2), padding="same"),
layers.LeakyReLU(alpha=0.2),
layers.GlobalMaxPooling2D(),
layers.Dense(1),
],
name="discriminator",
)
discriminator.summary()
Model: "discriminator" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 14, 14, 64) 640 _________________________________________________________________ leaky_re_lu (LeakyReLU) (None, 14, 14, 64) 0 _________________________________________________________________ conv2d_1 (Conv2D) (None, 7, 7, 128) 73856 _________________________________________________________________ leaky_re_lu_1 (LeakyReLU) (None, 7, 7, 128) 0 _________________________________________________________________ global_max_pooling2d (Global (None, 128) 0 _________________________________________________________________ dense_4 (Dense) (None, 1) 129 ================================================================= Total params: 74,625 Trainable params: 74,625 Non-trainable params: 0 _________________________________________________________________
Kemudian mari kita membuat jaringan pembangkit, yang mengubah vektor laten menjadi output dari bentuk (28, 28, 1)
(mewakili angka MNIST):
latent_dim = 128
generator = keras.Sequential(
[
keras.Input(shape=(latent_dim,)),
# We want to generate 128 coefficients to reshape into a 7x7x128 map
layers.Dense(7 * 7 * 128),
layers.LeakyReLU(alpha=0.2),
layers.Reshape((7, 7, 128)),
layers.Conv2DTranspose(128, (4, 4), strides=(2, 2), padding="same"),
layers.LeakyReLU(alpha=0.2),
layers.Conv2DTranspose(128, (4, 4), strides=(2, 2), padding="same"),
layers.LeakyReLU(alpha=0.2),
layers.Conv2D(1, (7, 7), padding="same", activation="sigmoid"),
],
name="generator",
)
Inilah kuncinya: loop pelatihan. Seperti yang Anda lihat, itu cukup mudah. Fungsi langkah pelatihan hanya membutuhkan 17 baris.
# Instantiate one optimizer for the discriminator and another for the generator.
d_optimizer = keras.optimizers.Adam(learning_rate=0.0003)
g_optimizer = keras.optimizers.Adam(learning_rate=0.0004)
# Instantiate a loss function.
loss_fn = keras.losses.BinaryCrossentropy(from_logits=True)
@tf.function
def train_step(real_images):
# Sample random points in the latent space
random_latent_vectors = tf.random.normal(shape=(batch_size, latent_dim))
# Decode them to fake images
generated_images = generator(random_latent_vectors)
# Combine them with real images
combined_images = tf.concat([generated_images, real_images], axis=0)
# Assemble labels discriminating real from fake images
labels = tf.concat(
[tf.ones((batch_size, 1)), tf.zeros((real_images.shape[0], 1))], axis=0
)
# Add random noise to the labels - important trick!
labels += 0.05 * tf.random.uniform(labels.shape)
# Train the discriminator
with tf.GradientTape() as tape:
predictions = discriminator(combined_images)
d_loss = loss_fn(labels, predictions)
grads = tape.gradient(d_loss, discriminator.trainable_weights)
d_optimizer.apply_gradients(zip(grads, discriminator.trainable_weights))
# Sample random points in the latent space
random_latent_vectors = tf.random.normal(shape=(batch_size, latent_dim))
# Assemble labels that say "all real images"
misleading_labels = tf.zeros((batch_size, 1))
# Train the generator (note that we should *not* update the weights
# of the discriminator)!
with tf.GradientTape() as tape:
predictions = discriminator(generator(random_latent_vectors))
g_loss = loss_fn(misleading_labels, predictions)
grads = tape.gradient(g_loss, generator.trainable_weights)
g_optimizer.apply_gradients(zip(grads, generator.trainable_weights))
return d_loss, g_loss, generated_images
Mari kita melatih GAN kami, dengan berulang kali menelepon train_step
pada batch gambar.
Karena diskriminator dan generator kami adalah konvnet, Anda akan ingin menjalankan kode ini pada GPU.
import os
# Prepare the dataset. We use both the training & test MNIST digits.
batch_size = 64
(x_train, _), (x_test, _) = keras.datasets.mnist.load_data()
all_digits = np.concatenate([x_train, x_test])
all_digits = all_digits.astype("float32") / 255.0
all_digits = np.reshape(all_digits, (-1, 28, 28, 1))
dataset = tf.data.Dataset.from_tensor_slices(all_digits)
dataset = dataset.shuffle(buffer_size=1024).batch(batch_size)
epochs = 1 # In practice you need at least 20 epochs to generate nice digits.
save_dir = "./"
for epoch in range(epochs):
print("\nStart epoch", epoch)
for step, real_images in enumerate(dataset):
# Train the discriminator & generator on one batch of real images.
d_loss, g_loss, generated_images = train_step(real_images)
# Logging.
if step % 200 == 0:
# Print metrics
print("discriminator loss at step %d: %.2f" % (step, d_loss))
print("adversarial loss at step %d: %.2f" % (step, g_loss))
# Save one generated image
img = tf.keras.preprocessing.image.array_to_img(
generated_images[0] * 255.0, scale=False
)
img.save(os.path.join(save_dir, "generated_img" + str(step) + ".png"))
# To limit execution time we stop after 10 steps.
# Remove the lines below to actually train the model!
if step > 10:
break
Start epoch 0 discriminator loss at step 0: 0.69 adversarial loss at step 0: 0.69
Itu dia! Anda akan mendapatkan digit MNIST palsu yang tampak bagus hanya setelah ~30 detik pelatihan di Colab GPU.