A TFX SchemaGen component to generate a schema from the training data.

Inherits From: BaseComponent, BaseNode

Used in the notebooks

Used in the tutorials

The SchemaGen component uses TensorFlow Data Validation to generate a schema from input statistics. The following TFX libraries use the schema:

  • TensorFlow Data Validation
  • TensorFlow Transform
  • TensorFlow Model Analysis

In a typical TFX pipeline, the SchemaGen component generates a schema which is consumed by the other pipeline components.


  # Generates schema based on statistics files.
  infer_schema = SchemaGen(statistics=statistics_gen.outputs['statistics'])

Component outputs contains:

See the SchemaGen guide for more details.

statistics A BaseChannel of ExampleStatistics type (required if spec is not passed). This should contain at least a train split. Other splits are currently ignored. required
infer_feature_shape Boolean (or RuntimeParameter) value indicating whether or not to infer the shape of features. If the feature shape is not inferred, downstream Tensorflow Transform component using the schema will parse input as tf.SparseTensor. Default to True if not set.
exclude_splits Names of splits that will not be taken into consideration when auto-generating a schema. Default behavior (when exclude_splits is set to None) is excluding no splits.

outputs Component's output channel dict.