![]() |
Official TFX StatisticsGen component.
Inherits From: BaseComponent
, BaseNode
tfx.components.StatisticsGen(
examples: tfx.types.Channel
= None,
schema: Optional[tfx.types.Channel
] = None,
stats_options: Optional[tfdv.StatsOptions] = None,
exclude_splits: Optional[List[Text]] = None,
output: Optional[tfx.types.Channel
] = None,
input_data: Optional[tfx.types.Channel
] = None,
instance_name: Optional[Text] = None
)
Used in the notebooks
Used in the tutorials |
---|
The StatisticsGen component generates features statistics and random samples over training data, which can be used for visualization and validation. StatisticsGen uses Apache Beam and approximate algorithms to scale to large datasets.
Please see https://www.tensorflow.org/tfx/data_validation for more details.
Example
# Computes statistics over data for visualization and example validation.
statistics_gen = StatisticsGen(examples=example_gen.outputs['examples'])
Args | |
---|---|
examples
|
A Channel of ExamplesPath type, likely generated by the
ExampleGen component.
This needs to contain two splits labeled train and eval . required
|
schema
|
A Schema channel to use for automatically configuring the value
of stats options passed to TFDV.
|
stats_options
|
The StatsOptions instance to configure optional TFDV
behavior. When stats_options.schema is set, it will be used instead of
the schema channel input. Due to the requirement that stats_options be
serialized, the slicer functions and custom stats generators are dropped
and are therefore not usable.
|
exclude_splits
|
Names of splits where statistics and sample should not be generated. Default behavior (when exclude_splits is set to None) is excluding no splits. |
output
|
ExampleStatisticsPath channel for statistics of each split
provided in the input examples.
|
input_data
|
Backwards compatibility alias for the examples argument.
|
instance_name
|
Optional name assigned to this specific instance of StatisticsGen. Required only if multiple StatisticsGen components are declared in the same pipeline. |
Attributes | |
---|---|
component_id
|
|
component_type
|
|
downstream_nodes
|
|
exec_properties
|
|
id
|
Node id, unique across all TFX nodes in a pipeline.
If |
inputs
|
|
outputs
|
|
type
|
|
upstream_nodes
|
Child Classes
Methods
add_downstream_node
add_downstream_node(
downstream_node
)
Experimental: Add another component that must run after this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_upstream_node
.
Args | |
---|---|
downstream_node
|
a component that must run after this node. |
add_upstream_node
add_upstream_node(
upstream_node
)
Experimental: Add another component that must run before this one.
This method enables task-based dependencies by enforcing execution order for synchronous pipelines on supported platforms. Currently, the supported platforms are Airflow, Beam, and Kubeflow Pipelines.
Note that this API call should be considered experimental, and may not work with asynchronous pipelines, sub-pipelines and pipelines with conditional nodes. We also recommend relying on data for capturing dependencies where possible to ensure data lineage is fully captured within MLMD.
It is symmetric with add_downstream_node
.
Args | |
---|---|
upstream_node
|
a component that must run before this node. |
from_json_dict
@classmethod
from_json_dict( dict_data: Dict[Text, Any] ) -> Any
Convert from dictionary data to an object.
get_id
@classmethod
get_id( instance_name: Optional[Text] = None )
Gets the id of a node.
This can be used during pipeline authoring time. For example: from tfx.components import Trainer
resolver = ResolverNode(..., model=Channel( type=Model, producer_component_id=Trainer.get_id('my_trainer')))
Args | |
---|---|
instance_name
|
(Optional) instance name of a node. If given, the instance name will be taken into consideration when generating the id. |
Returns | |
---|---|
an id for the node. |
to_json_dict
to_json_dict() -> Dict[Text, Any]
Convert from an object to a JSON serializable dictionary.
with_id
with_id(
id: Text
) -> "BaseNode"
with_platform_config
with_platform_config(
config: message.Message
) -> "BaseComponent"
Attaches a proto-form platform config to a component.
The config will be a per-node platform-specific config.
Args | |
---|---|
config
|
platform config to attach to the component. |
Returns | |
---|---|
the same component itself. |
Class Variables | |
---|---|
EXECUTOR_SPEC |
tfx.dsl.components.base.executor_spec.ExecutorClassSpec
|