Join the SIG TFX-Addons community and help make TFX even better!


Official TFX StatisticsGen component.

Inherits From: BaseBeamComponent, BaseComponent, BaseNode

Used in the notebooks

Used in the tutorials

The StatisticsGen component generates features statistics and random samples over training data, which can be used for visualization and validation. StatisticsGen uses Apache Beam and approximate algorithms to scale to large datasets.

Please see for more details.


  # Computes statistics over data for visualization and example validation.
  statistics_gen = StatisticsGen(examples=example_gen.outputs['examples'])

Component outputs contains:

examples A Channel of ExamplesPath type, likely generated by the ExampleGen component. This needs to contain two splits labeled train and eval. required
schema A Schema channel to use for automatically configuring the value of stats options passed to TFDV.
stats_options The StatsOptions instance to configure optional TFDV behavior. When stats_options.schema is set, it will be used instead of the schema channel input. Due to the requirement that stats_options be serialized, the slicer functions and custom stats generators are dropped and are therefore not usable.
exclude_splits Names of splits where statistics and sample should not be generated. Default behavior (when exclude_splits is set to None) is excluding no splits.

outputs Component's output channel dict.



View source

Add per component Beam pipeline args.

beam_pipeline_args List of Beam pipeline args to be added to the Beam executor spec.

the same component itself.