TensorFlow.org에서보기 | Google Colab에서 실행하기 | GitHub에서 소스 보기 | 노트북 다운로드하기 |
TensorFlow 1에서 훈련 동작을 사용자 정의하려면 tf.estimator.Estimator
와 함께 tf.estimator.SessionRunHook
를 사용해야 합니다. 이 가이드는 tf.keras.callbacks.Callback
API를 사용하여 SessionRunHook
에서 TensorFlow 2의 사용자 정의 콜백으로 마이그레이션하는 방법을 설명합니다. 이 API는 훈련용 Keras Model.fit
(Model.evaluate
및 Model.predict
도 동일)와 함께 작동합니다. 훈련을 진행하는 동안 초당 예제를 측정하는 SessionRunHook
및 Callback
작업을 구현하여 이를 수행하는 방법을 배우게 됩니다.
콜백의 예제로는 체크포인트 저장(tf.keras.callbacks.ModelCheckpoint
) 및 TensorBoard 요약문 작성이 있습니다. Keras 콜백은 내장된 Keras Model.fit
/Model.evaluate
/Model.predict
API에서 훈련/평가/예측을 진행하는 동안 서로 다른 지점에서 호출되는 객체입니다. 콜백에 대한 자세한 내용은 tf.keras.callbacks.Callback
API 문서와 직접 콜백 작성하기 및 내장 메서드를 사용하여 훈련하고 평가하기(콜백 사용하기 섹션) 가이드를 참조하세요.
설치하기
데모를 위해 가져오기 및 간단한 데이터세트로 시작해 보겠습니다.
import tensorflow as tf
import tensorflow.compat.v1 as tf1
import time
from datetime import datetime
from absl import flags
2022-12-14 20:23:14.924051: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory 2022-12-14 20:23:14.924145: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory 2022-12-14 20:23:14.924155: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
features = [[1., 1.5], [2., 2.5], [3., 3.5]]
labels = [[0.3], [0.5], [0.7]]
eval_features = [[4., 4.5], [5., 5.5], [6., 6.5]]
eval_labels = [[0.8], [0.9], [1.]]
TensorFlow 1: tf.estimator API로 사용자 정의 SessionRunHook 생성하기
다음 TensorFlow 1 예제는 훈련을 진행하는 동안 초당 예제를 측정하는 사용자 정의 SessionRunHook
를 설정하는 방법을 보여줍니다. 후크(LoggerHook
)를 생성한 후 tf.estimator.Estimator.train
의 hooks
매개변수에 전달합니다.
def _input_fn():
return tf1.data.Dataset.from_tensor_slices(
(features, labels)).batch(1).repeat(100)
def _model_fn(features, labels, mode):
logits = tf1.layers.Dense(1)(features)
loss = tf1.losses.mean_squared_error(labels=labels, predictions=logits)
optimizer = tf1.train.AdagradOptimizer(0.05)
train_op = optimizer.minimize(loss, global_step=tf1.train.get_global_step())
return tf1.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op)
class LoggerHook(tf1.train.SessionRunHook):
"""Logs loss and runtime."""
def begin(self):
self._step = -1
self._start_time = time.time()
self.log_frequency = 10
def before_run(self, run_context):
self._step += 1
def after_run(self, run_context, run_values):
if self._step % self.log_frequency == 0:
current_time = time.time()
duration = current_time - self._start_time
self._start_time = current_time
examples_per_sec = self.log_frequency / duration
print('Time:', datetime.now(), ', Step #:', self._step,
', Examples per second:', examples_per_sec)
estimator = tf1.estimator.Estimator(model_fn=_model_fn)
# Begin training.
estimator.train(_input_fn, hooks=[LoggerHook()])
INFO:tensorflow:Using default config. WARNING:tensorflow:Using temporary folder as model directory: /tmpfs/tmp/tmpi0__a12q INFO:tensorflow:Using config: {'_model_dir': '/tmpfs/tmp/tmpi0__a12q', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true graph_options { rewrite_options { meta_optimizer_iterations: ONE } } , '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1} WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/training_util.py:396: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts. INFO:tensorflow:Calling model_fn. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/adagrad.py:138: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Create CheckpointSaverHook. INFO:tensorflow:Graph was finalized. INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0... INFO:tensorflow:Saving checkpoints for 0 into /tmpfs/tmp/tmpi0__a12q/model.ckpt. INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0... Time: 2022-12-14 20:23:19.989508 , Step #: 0 , Examples per second: 2.6514216127414776 INFO:tensorflow:loss = 1.8147948, step = 0 Time: 2022-12-14 20:23:20.022854 , Step #: 10 , Examples per second: 299.86730725234503 Time: 2022-12-14 20:23:20.029819 , Step #: 20 , Examples per second: 1435.7171219278428 Time: 2022-12-14 20:23:20.036575 , Step #: 30 , Examples per second: 1479.8899160256863 Time: 2022-12-14 20:23:20.043157 , Step #: 40 , Examples per second: 1519.345069912338 Time: 2022-12-14 20:23:20.049921 , Step #: 50 , Examples per second: 1478.2730060268566 Time: 2022-12-14 20:23:20.056636 , Step #: 60 , Examples per second: 1489.2958846713773 Time: 2022-12-14 20:23:20.063276 , Step #: 70 , Examples per second: 1505.9256067786873 Time: 2022-12-14 20:23:20.069780 , Step #: 80 , Examples per second: 1537.5578283661425 Time: 2022-12-14 20:23:20.076458 , Step #: 90 , Examples per second: 1497.5378463296202 INFO:tensorflow:global_step/sec: 1055.75 Time: 2022-12-14 20:23:20.084892 , Step #: 100 , Examples per second: 1185.6689752650177 INFO:tensorflow:loss = 1.4855434e-05, step = 100 (0.095 sec) Time: 2022-12-14 20:23:20.092704 , Step #: 110 , Examples per second: 1280.0390636921293 Time: 2022-12-14 20:23:20.099669 , Step #: 120 , Examples per second: 1435.7662684421318 Time: 2022-12-14 20:23:20.106356 , Step #: 130 , Examples per second: 1495.5621322873953 Time: 2022-12-14 20:23:20.113093 , Step #: 140 , Examples per second: 1484.289050888244 Time: 2022-12-14 20:23:20.119825 , Step #: 150 , Examples per second: 1485.4455305284034 Time: 2022-12-14 20:23:20.126593 , Step #: 160 , Examples per second: 1477.491897985064 Time: 2022-12-14 20:23:20.133065 , Step #: 170 , Examples per second: 1545.2617617802011 Time: 2022-12-14 20:23:20.139650 , Step #: 180 , Examples per second: 1518.410020634978 Time: 2022-12-14 20:23:20.146197 , Step #: 190 , Examples per second: 1527.6456876456878 INFO:tensorflow:global_step/sec: 1437.75 Time: 2022-12-14 20:23:20.154350 , Step #: 200 , Examples per second: 1226.4405391970527 INFO:tensorflow:loss = 0.00070948945, step = 200 (0.069 sec) Time: 2022-12-14 20:23:20.161802 , Step #: 210 , Examples per second: 1341.9196314307653 Time: 2022-12-14 20:23:20.168527 , Step #: 220 , Examples per second: 1487.0254555768277 Time: 2022-12-14 20:23:20.175179 , Step #: 230 , Examples per second: 1503.388651923008 Time: 2022-12-14 20:23:20.181607 , Step #: 240 , Examples per second: 1555.4622659002412 Time: 2022-12-14 20:23:20.188128 , Step #: 250 , Examples per second: 1533.7345961165759 Time: 2022-12-14 20:23:20.195322 , Step #: 260 , Examples per second: 1389.9009179176194 Time: 2022-12-14 20:23:20.201964 , Step #: 270 , Examples per second: 1505.709362435382 Time: 2022-12-14 20:23:20.208528 , Step #: 280 , Examples per second: 1523.4287374691269 Time: 2022-12-14 20:23:20.214937 , Step #: 290 , Examples per second: 1560.1487873828298 INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 300... INFO:tensorflow:Saving checkpoints for 300 into /tmpfs/tmp/tmpi0__a12q/model.ckpt. INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 300... INFO:tensorflow:Loss for final step: 0.0005846145. <tensorflow_estimator.python.estimator.estimator.Estimator at 0x7f5941222d30>
TensorFlow 2: Model.fit용 사용자 정의 Keras 콜백 생성하기
TensorFlow 2에서 훈련/평가에 내장 Keras Model.fit
(또는 Model.evaluate
)를 사용하는 경우 사용자 정의tf. keras.callbacks.Callback
을 구성한 후 이를 Model.fit
(또는 Model.evaluate
)의 callbacks
매개변수로 전달합니다(자세한 내용은 직접 콜백 작성하기 가이드에서 확인).
아래 예제에서는 다양한 메트릭을 기록하는 사용자 정의 tf.keras.callbacks.Callback
을 작성합니다. 초당 예제를 측정하며 이전 SessionRunHook
의 메트릭과 비교할 수 있어야 합니다.
class CustomCallback(tf.keras.callbacks.Callback):
def on_train_begin(self, logs = None):
self._step = -1
self._start_time = time.time()
self.log_frequency = 10
def on_train_batch_begin(self, batch, logs = None):
self._step += 1
def on_train_batch_end(self, batch, logs = None):
if self._step % self.log_frequency == 0:
current_time = time.time()
duration = current_time - self._start_time
self._start_time = current_time
examples_per_sec = self.log_frequency / duration
print('Time:', datetime.now(), ', Step #:', self._step,
', Examples per second:', examples_per_sec)
callback = CustomCallback()
dataset = tf.data.Dataset.from_tensor_slices(
(features, labels)).batch(1).repeat(100)
model = tf.keras.models.Sequential([tf.keras.layers.Dense(1)])
optimizer = tf.keras.optimizers.Adagrad(learning_rate=0.05)
model.compile(optimizer, "mse")
# Begin training.
result = model.fit(dataset, callbacks=[callback], verbose = 0)
# Provide the results of training metrics.
result.history
Time: 2022-12-14 20:23:21.131505 , Step #: 0 , Examples per second: 20.585601724471335 Time: 2022-12-14 20:23:21.150332 , Step #: 10 , Examples per second: 531.072450555851 Time: 2022-12-14 20:23:21.166234 , Step #: 20 , Examples per second: 628.8311844077961 Time: 2022-12-14 20:23:21.182892 , Step #: 30 , Examples per second: 600.3269068372765 Time: 2022-12-14 20:23:21.199825 , Step #: 40 , Examples per second: 590.5557354659758 Time: 2022-12-14 20:23:21.216653 , Step #: 50 , Examples per second: 594.2287203898901 Time: 2022-12-14 20:23:21.233343 , Step #: 60 , Examples per second: 599.1777260324852 Time: 2022-12-14 20:23:21.249616 , Step #: 70 , Examples per second: 614.5140211562692 Time: 2022-12-14 20:23:21.266385 , Step #: 80 , Examples per second: 596.3239308462238 Time: 2022-12-14 20:23:21.283419 , Step #: 90 , Examples per second: 587.0675344670726 Time: 2022-12-14 20:23:21.299863 , Step #: 100 , Examples per second: 608.1257340041467 Time: 2022-12-14 20:23:21.315915 , Step #: 110 , Examples per second: 622.9657794676806 Time: 2022-12-14 20:23:21.332126 , Step #: 120 , Examples per second: 616.8638409271406 Time: 2022-12-14 20:23:21.348881 , Step #: 130 , Examples per second: 596.8500441130433 Time: 2022-12-14 20:23:21.366014 , Step #: 140 , Examples per second: 583.685272547628 Time: 2022-12-14 20:23:21.382683 , Step #: 150 , Examples per second: 599.8804330725554 Time: 2022-12-14 20:23:21.398955 , Step #: 160 , Examples per second: 614.5680459500645 Time: 2022-12-14 20:23:21.415907 , Step #: 170 , Examples per second: 589.9161744022504 Time: 2022-12-14 20:23:21.432503 , Step #: 180 , Examples per second: 602.5260012641498 Time: 2022-12-14 20:23:21.448798 , Step #: 190 , Examples per second: 613.6688710715749 Time: 2022-12-14 20:23:21.464718 , Step #: 200 , Examples per second: 628.1625256473619 Time: 2022-12-14 20:23:21.481145 , Step #: 210 , Examples per second: 608.7700658945107 Time: 2022-12-14 20:23:21.497145 , Step #: 220 , Examples per second: 624.9707950888068 Time: 2022-12-14 20:23:21.512982 , Step #: 230 , Examples per second: 631.4345502446369 Time: 2022-12-14 20:23:21.529601 , Step #: 240 , Examples per second: 601.7480129694987 Time: 2022-12-14 20:23:21.545946 , Step #: 250 , Examples per second: 611.8248388131983 Time: 2022-12-14 20:23:21.563054 , Step #: 260 , Examples per second: 584.5068145711977 Time: 2022-12-14 20:23:21.580755 , Step #: 270 , Examples per second: 564.9426881995608 Time: 2022-12-14 20:23:21.597789 , Step #: 280 , Examples per second: 587.0428843354607 Time: 2022-12-14 20:23:21.614728 , Step #: 290 , Examples per second: 590.372862270392 {'loss': [0.3040516972541809]}
다음 단계
콜백에 대한 자세한 내용:
- API 문서:
tf.keras.callbacks.Callback
- 가이드: 직접 콜백 작성하기
- 가이드: 내장 메서드를 사용하여 훈련하고 평가하기(콜백 사용하기 섹션)
다음과 같은 마이그레이션 관련 리소스도 유용할 수 있습니다.
- 조기 중단 마이그레이션 가이드: 내장 조기 중단 콜백인
tf.keras.callbacks.EarlyStopping
- TensorBoard 마이그레이션 가이드: TensorBoard를 사용하여 메트릭 추적 및 표시하기
- LoggingTensorHook 및 StopAtStepHook에서 Keras 콜백으로의 마이그레이션 가이드