SavedModel Warmup
Stay organized with collections
Save and categorize content based on your preferences.
Introduction
The TensorFlow runtime has components that are lazily initialized,
which can cause high latency for the first request/s sent to a model after it is
loaded. This latency can be several orders of magnitude higher than that of a
single inference request.
To reduce the impact of lazy initialization on request latency, it's possible to
trigger the initialization of the sub-systems and components at model load time
by providing a sample set of inference requests along with the SavedModel. This
process is known as "warming up" the model.
Usage
SavedModel Warmup is supported for Regress, Classify, MultiInference and
Predict. To trigger warmup of the model at load time, attach a warmup data file
under the assets.extra subfolder of the SavedModel directory.
Requirements for model warmup to work correctly:
- Warmup file name: 'tf_serving_warmup_requests'
- File location: assets.extra/
- File format:
TFRecord
with each record as a
PredictionLog.
- Number of warmup records <= 1000.
- The warmup data must be representative of the inference requests used at
serving.
Warm-up data generation
Warmup data can be added in two ways:
- By directly populating the warmup requests into your exported Saved Model.
This could be done by creating a script reading a list of sample
inference requests, converting each request into
PredictionLog
(if it's originally in a different format) and using
TFRecordWriter
to write the PredictionLog entries into
YourSavedModel/assets.extra/tf_serving_warmup_requests
.
- By using TFX Infra Validator
option to export a Saved Model with warmup.
With this option the TFX Infa Validator will populate
YourSavedModel/assets.extra/tf_serving_warmup_requests
based on the
validation requests provided via
RequestSpec.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2023-10-07 UTC.
[null,null,["Last updated 2023-10-07 UTC."],[],[],null,["# SavedModel Warmup\n\n\u003cbr /\u003e\n\nIntroduction\n------------\n\nThe TensorFlow runtime has components that are lazily initialized,\nwhich can cause high latency for the first request/s sent to a model after it is\nloaded. This latency can be several orders of magnitude higher than that of a\nsingle inference request.\n\nTo reduce the impact of lazy initialization on request latency, it's possible to\ntrigger the initialization of the sub-systems and components at model load time\nby providing a sample set of inference requests along with the SavedModel. This\nprocess is known as \"warming up\" the model.\n\nUsage\n-----\n\nSavedModel Warmup is supported for Regress, Classify, MultiInference and\nPredict. To trigger warmup of the model at load time, attach a warmup data file\nunder the assets.extra subfolder of the SavedModel directory.\n\nRequirements for model warmup to work correctly:\n\n- Warmup file name: 'tf_serving_warmup_requests'\n- File location: assets.extra/\n- File format: [TFRecord](https://www.tensorflow.org/tutorials/load_data/tfrecord#tfrecords_format_details) with each record as a [PredictionLog](https://github.com/tensorflow/serving/blob/master/tensorflow_serving/apis/prediction_log.proto#:%7E:text=message-,PredictionLog,-%7B).\n- Number of warmup records \\\u003c= 1000.\n- The warmup data must be representative of the inference requests used at serving.\n\nWarm-up data generation\n-----------------------\n\nWarmup data can be added in two ways:\n\n- By directly populating the warmup requests into your exported Saved Model. This could be done by creating a script reading a list of sample inference requests, converting each request into [PredictionLog](https://github.com/tensorflow/serving/blob/master/tensorflow_serving/apis/prediction_log.proto#:%7E:text=message-,PredictionLog,-%7B) (if it's originally in a different format) and using [TFRecordWriter](https://www.tensorflow.org/api_docs/python/tf/io/TFRecordWriter) to write the PredictionLog entries into `YourSavedModel/assets.extra/tf_serving_warmup_requests`.\n- By using TFX Infra Validator [option to export a Saved Model with warmup](https://www.tensorflow.org/tfx/guide/infra_validator#producing_a_savedmodel_with_warmup). With this option the TFX Infa Validator will populate `YourSavedModel/assets.extra/tf_serving_warmup_requests` based on the validation requests provided via [RequestSpec](https://www.tensorflow.org/tfx/guide/infra_validator#requestspec)."]]