Using tf.Transform with TensorFlow 2.x

Starting with the 0.30 release of tf.Transform, the default behavior is to export a TF 2.x SavedModel unless TF 2.x behaviors are explicitly disabled. This page provides a guide for using tf.Transform to export the transform graph as a TensorFlow 2.x SavedModel.

New in tf.Transform with TF 2.x

Loading Keras models within the preprocessing_fn

Please use the tft.make_and_track_object API to load Keras models as shown in the example below.

def preprocessing_fn(inputs):
  keras_model = tft.make_and_track_object(lambda: tf.keras.models.load_model(...), name='_unique_name')
  ...
  return {'keras_model_output': keras_model(inputs[...])}

Using TF 2.x tf.hub modules

TF 2.x hub modules work in tf.Transform only when the preprocessing_fn is traced and exported as a TF 2.x SavedModel (this is the default behavior starting with tensorflow_transform 0.30). Please use the tft.make_and_track_object API to load tf.hub modules as shown in the example below.

def preprocessing_fn(inputs):
  hub_module = tft.make_and_track_object(lambda: hub.load(...))
  ...
  return {'hub_module_output': hub_module(inputs[...])}

Potential migration issues

If migrating an existing tf.Transform pipeline from TF 1.x to TF 2.x, the following issues may be encountered:

RuntimeError: The order of analyzers in your preprocessing_fn appears to be non-deterministic.

In TF 2.x, the preprocessing_fn provided by the user is traced several times. If the order in which TFT analyzers are encountered changes with each trace, this error will be raised. This can be fixed by removing any non-determinism in the order in which TFT analyzers are invoked.

Output of transform_raw_features does not contain expected feature.

Example exceptions:

KeyError: \<feature key>

or

\<feature key> not found in features dictionary.

TFTransformOutput.transform_raw_features ignores the drop_unused_features parameter and behaves as if it were True. Please update any usages of the output dictionary from this API to check if the key you are attempting to retrieve exists in it.

tf.estimator.BaselineClassifier sees Table not initialized error.

Example exception:

tensorflow.python.framework.errors_impl.FailedPreconditionError: Table not initialized.

Support for Trainer with Estimator based executor is best-effort. While other estimators work, we have seen issues with table initialization in the BaselineClassifier. Please disable TF 2.x in tf.Transform.

Known issues / Features not yet supported

Outputting vocabularies in TFRecord format is not yet supported.

tfrecord_gzip is not yet supported as a valid value for the file_format parameter in tft.vocabulary (and other vocabulary APIs).

Retaining the legacy tf.Transform behavior

If your tf.Transform pipeline should not run with TF 2.x, you can retain the legacy behavior in one of the following ways: