tfx.components.SchemaGen

A TFX SchemaGen component to generate a schema from the training data.

Inherits From: BaseComponent, BaseNode

Used in the notebooks

Used in the tutorials

The SchemaGen component uses TensorFlow Data Validation to generate a schema from input statistics. The following TFX libraries use the schema:

  • TensorFlow Data Validation
  • TensorFlow Transform
  • TensorFlow Model Analysis

In a typical TFX pipeline, the SchemaGen component generates a schema which is is consumed by the other pipeline components.

Please see https://www.tensorflow.org/tfx/data_validation for more details.

Example

  # Generates schema based on statistics files.
  infer_schema = SchemaGen(statistics=statistics_gen.outputs['statistics'])

statistics A Channel of ExampleStatistics type (required if spec is not passed). This should contain at least a train split. Other splits are currently ignored. required
infer_feature_shape Boolean (or RuntimeParameter) value indicating whether or not to infer the shape of features. If the feature shape is not inferred, downstream Tensorflow Transform component using the schema will parse input as tf.SparseTensor.
exclude_splits Names of splits that will not be taken into consideration when auto-generating a schema. Default behavior (when exclude_splits is set to None) is excluding no splits.
output Output Schema channel for schema result.
stats Backwards compatibility alias for the 'statistics' argument.
instance_name Optional name assigned to this specific instance of SchemaGen. Required only if multiple SchemaGen components are declared in the same pipeline. Either statistics or stats must be present in the input arguments.

component_id DEPRECATED FUNCTION

component_type DEPRECATED FUNCTION
downstream_nodes

exec_properties

id Node id, unique across all TFX nodes in a pipeline.

If id is set by the user, return it directly. otherwise, if instance name (deprecated) is available, node id will be: . otherwise, node id will be:

inputs

outputs

type

upstream_nodes

Child Classes

class DRIVER_CLASS

class SPEC_CLASS

Methods

add_downstream_node

View source

Experimental: Add another component that must run after this one.

This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.

Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.

It is symmetric with add_upstream_node.

Args
downstream_node a component that must run after this node.

add_upstream_node

View source

Experimental: Add another component that must run before this one.

This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.

Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.

It is symmetric with add_downstream_node.

Args
upstream_node a component that must run before this node.

from_json_dict

View source

Convert from dictionary data to an object.

get_id

View source

Gets the id of a node. (deprecated)

This can be used during pipeline authoring time. For example: from tfx.components import Trainer

resolver = ResolverNode(..., model=Channel( type=Model, producer_component_id=Trainer.get_id('my_trainer')))

Args
instance_name (Optional) instance name of a node. If given, the instance name will be taken into consideration when generating the id.

Returns
an id for the node.

to_json_dict

View source

Convert from an object to a JSON serializable dictionary.

with_id

View source

with_platform_config

View source

Attaches a proto-form platform config to a component.

The config will be a per-node platform-specific config.

Args
config platform config to attach to the component.

Returns
the same component itself.

EXECUTOR_SPEC tfx.dsl.components.base.executor_spec.ExecutorClassSpec