TensorFlow.org에서 보기 |
Google Colab에서 실행하기 |
GitHub에서 소스 보기 |
노트북 다운로드하기 |
개요
이 노트북은 TensorFlow Addons 패키지의 모델 평균 체크포인트와 함께 Moving Average Optimizer를 사용하는 방법을 보여줍니다.
이동 평균
이동 평균의 장점은 최신 배치에서 급격한 손실 이동이나 불규칙한 데이터 표현에 덜 취약하다는 것입니다. 어느 시점까지는 모델 훈련에 대한 좀 더 일반적인 아이디어를 제공합니다.
확률적 평균
Stocastic Weight Averaging(SWA)은 더 넓은 최적값으로 수렴됩니다. 기하학적 앙상블링과 비슷하게 됩니다. SWA는 다른 옵티마이저의 래퍼로 사용될 때 모델 성능을 개선하고 내부 옵티마이저의 서로 다른 궤적 포인트에서 결과를 평균화하는 간단한 방법입니다.
모델 평균 체크포인트
callbacks.ModelCheckpoint는 훈련 중에 이동 평균 가중치를 저장하는 옵션을 제공하지 않습니다. 따라서 Moving Average Optimizer에서 사용자 정의 콜백이 필요합니다.update_weights매개변수와ModelAverageCheckpoint를 사용하면 다음이 가능합니다.
- 이동 평균 가중치를 모델에 할당하고 저장합니다.
- 이전의 평균이 아닌 가중치를 유지하지만, 저장된 모델은 평균 가중치를 사용합니다.
설정
pip install -q -U tensorflow-addonsimport tensorflow as tf
import tensorflow_addons as tfa
import numpy as np
import os
모델 빌드하기
def create_model(opt):
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer=opt,
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
return model
데이터세트 준비하기
#Load Fashion MNIST dataset
train, test = tf.keras.datasets.fashion_mnist.load_data()
images, labels = train
images = images/255.0
labels = labels.astype(np.int32)
fmnist_train_ds = tf.data.Dataset.from_tensor_slices((images, labels))
fmnist_train_ds = fmnist_train_ds.shuffle(5000).batch(32)
test_images, test_labels = test
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz 32768/29515 [=================================] - 0s 0us/step Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz 26427392/26421880 [==============================] - 0s 0us/step Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz 8192/5148 [===============================================] - 0s 0us/step Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz 4423680/4422102 [==============================] - 0s 0us/step
여기에서는 3가지 옵티마이저를 비교할 것입니다.
- 래핑되지 않은 SGD
- 이동 평균을 사용하는 SGD
- 확률적 가중치 평균을 사용하는 SGD
같은 모델에서 어떻게 동작하는지 확인합니다.
#Optimizers
sgd = tf.keras.optimizers.SGD(0.01)
moving_avg_sgd = tfa.optimizers.MovingAverage(sgd)
stocastic_avg_sgd = tfa.optimizers.SWA(sgd)
MovingAverage 및 StocasticAverage 옵티마이저 모두 ModelAverageCheckpoint를 사용합니다.
#Callback
checkpoint_path = "./training/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_dir,
save_weights_only=True,
verbose=1)
avg_callback = tfa.callbacks.AverageModelCheckpoint(filepath=checkpoint_dir,
update_weights=True)
모델 훈련하기
Vanilla SGD 옵티마이저
#Build Model
model = create_model(sgd)
#Train the network
model.fit(fmnist_train_ds, epochs=5, callbacks=[cp_callback])
Epoch 1/5 1875/1875 [==============================] - 4s 2ms/step - loss: 1.0748 - accuracy: 0.6571 Epoch 00001: saving model to ./training Epoch 2/5 1875/1875 [==============================] - 4s 2ms/step - loss: 0.5133 - accuracy: 0.8224 Epoch 00002: saving model to ./training Epoch 3/5 1875/1875 [==============================] - 4s 2ms/step - loss: 0.4605 - accuracy: 0.8380 Epoch 00003: saving model to ./training Epoch 4/5 1875/1875 [==============================] - 4s 2ms/step - loss: 0.4315 - accuracy: 0.8469 Epoch 00004: saving model to ./training Epoch 5/5 1875/1875 [==============================] - 4s 2ms/step - loss: 0.4078 - accuracy: 0.8563 Epoch 00005: saving model to ./training <tensorflow.python.keras.callbacks.History at 0x7fb839e9fd68>
#Evalute results
model.load_weights(checkpoint_dir)
loss, accuracy = model.evaluate(test_images, test_labels, batch_size=32, verbose=2)
print("Loss :", loss)
print("Accuracy :", accuracy)
313/313 - 0s - loss: 79.6013 - accuracy: 0.8019 Loss : 79.60128021240234 Accuracy : 0.8019000291824341
이동 평균 SGD
#Build Model
model = create_model(moving_avg_sgd)
#Train the network
model.fit(fmnist_train_ds, epochs=5, callbacks=[avg_callback])
Epoch 1/5 1875/1875 [==============================] - 5s 2ms/step - loss: 1.1034 - accuracy: 0.6502 INFO:tensorflow:Assets written to: ./training/assets Epoch 2/5 1875/1875 [==============================] - 4s 2ms/step - loss: 0.5254 - accuracy: 0.8154 INFO:tensorflow:Assets written to: ./training/assets Epoch 3/5 1875/1875 [==============================] - 4s 2ms/step - loss: 0.4718 - accuracy: 0.8335 INFO:tensorflow:Assets written to: ./training/assets Epoch 4/5 1875/1875 [==============================] - 4s 2ms/step - loss: 0.4436 - accuracy: 0.8423 INFO:tensorflow:Assets written to: ./training/assets Epoch 5/5 1875/1875 [==============================] - 4s 2ms/step - loss: 0.4221 - accuracy: 0.8531 INFO:tensorflow:Assets written to: ./training/assets <tensorflow.python.keras.callbacks.History at 0x7fb839f0d630>
#Evalute results
model.load_weights(checkpoint_dir)
loss, accuracy = model.evaluate(test_images, test_labels, batch_size=32, verbose=2)
print("Loss :", loss)
print("Accuracy :", accuracy)
313/313 - 0s - loss: 79.6013 - accuracy: 0.8019 Loss : 79.60128021240234 Accuracy : 0.8019000291824341
확률적 가중치 평균 SGD
#Build Model
model = create_model(stocastic_avg_sgd)
#Train the network
model.fit(fmnist_train_ds, epochs=5, callbacks=[avg_callback])
Epoch 1/5 1875/1875 [==============================] - 6s 3ms/step - loss: 1.1160 - accuracy: 0.6463 INFO:tensorflow:Assets written to: ./training/assets Epoch 2/5 1875/1875 [==============================] - 5s 3ms/step - loss: 0.6035 - accuracy: 0.7968 INFO:tensorflow:Assets written to: ./training/assets Epoch 3/5 1875/1875 [==============================] - 5s 3ms/step - loss: 0.5594 - accuracy: 0.8102 INFO:tensorflow:Assets written to: ./training/assets Epoch 4/5 1875/1875 [==============================] - 5s 3ms/step - loss: 0.5365 - accuracy: 0.8170 INFO:tensorflow:Assets written to: ./training/assets Epoch 5/5 1875/1875 [==============================] - 5s 3ms/step - loss: 0.5239 - accuracy: 0.8199 INFO:tensorflow:Assets written to: ./training/assets <tensorflow.python.keras.callbacks.History at 0x7fb7ac51dac8>
#Evalute results
model.load_weights(checkpoint_dir)
loss, accuracy = model.evaluate(test_images, test_labels, batch_size=32, verbose=2)
print("Loss :", loss)
print("Accuracy :", accuracy)
313/313 - 0s - loss: 79.6013 - accuracy: 0.8019 Loss : 79.60128021240234 Accuracy : 0.8019000291824341
TensorFlow.org에서 보기
Google Colab에서 실행하기
GitHub에서 소스 보기
노트북 다운로드하기