tfx.v1.dsl.components.component

Decorator: creates a component from a typehint-annotated Python function.

Used in the notebooks

Used in the tutorials

This decorator creates a component based on typehint annotations specified for the arguments and return value for a Python function. The decorator can be supplied with a parameter component_annotation to specify the annotation for this component decorator. This annotation hints which system execution type this python function-based component belongs to. Specifically, function arguments can be annotated with the following types and associated semantics:

  • Parameter[T] where T is int, float, str, or bytes: indicates that a primitive type execution parameter, whose value is known at pipeline construction time, will be passed for this argument. These parameters will be recorded in ML Metadata as part of the component's execution record. Can be an optional argument.
  • int, float, str, bytes: indicates that a primitive type value will be passed for this argument. This value is tracked as an Integer, Float String or Bytes artifact (see tfx.types.standard_artifacts) whose value is read and passed into the given Python component function. Can be an optional argument.
  • InputArtifact[ArtifactType]: indicates that an input artifact object of type ArtifactType (deriving from tfx.types.Artifact) will be passed for this argument. This artifact is intended to be consumed as an input by this component (possibly reading from the path specified by its .uri). Can be an optional argument by specifying a default value of None.
  • OutputArtifact[ArtifactType]: indicates that an output artifact object of type ArtifactType (deriving from tfx.types.Artifact) will be passed for this argument. This artifact is intended to be emitted as an output by this component (and written to the path specified by its .uri). Cannot be an optional argument.

The return value typehint should be either empty or None, in the case of a component function that has no return values, or an instance of OutputDict(key_1=type_1, ...), where each key maps to a given type (each type is a primitive value type, i.e. int, float, str or bytes; or Optional[T], where T is a primitive type value, in which case None can be returned), to indicate that the return value is a dictionary with specified keys and value types.

Note that output artifacts should not be included in the return value typehint; they should be included as OutputArtifact annotations in the function inputs, as described above.

The function to which this decorator is applied must be at the top level of its Python module (it may not be defined within nested classes or function closures).

This is example usage of component definition using this decorator:

from tfx.dsl.components.base.annotations import OutputDict
from tfx.dsl.components.base.annotations import
InputArtifact
from tfx.dsl.components.base.annotations import
OutputArtifact
from tfx.dsl.components.base.annotations import
Parameter
from tfx.dsl.components.base.decorators import component
from tfx.types import standard_artifacts
from tfx.types import system_executions

@component(component_annotation=system_executions.Train)
def MyTrainerComponent(
    training_data: InputArtifact[standard_artifacts.Examples],
    model: OutputArtifact[standard_artifacts.Model],
    dropout_hyperparameter: float,
    num_iterations: Parameter[int] = 10
    ) -> OutputDict(loss=float, accuracy=float):
  '''My simple trainer component.'''

  records = read_examples(training_data.uri)
  model_obj = train_model(records, num_iterations, dropout_hyperparameter)
  model_obj.write_to(model.uri)

  return {
    'loss': model_obj.loss,
    'accuracy': model_obj.accuracy
  }

Example:usage in a pipeline graph definition:
# ...
trainer = MyTrainerComponent(
    training_data=example_gen.outputs['examples'],
    dropout_hyperparameter=other_component.outputs['dropout'],
    num_iterations=1000)
pusher = Pusher(model=trainer.outputs['model'])
# ...

When the parameter component_annotation is not supplied, the default value is None. This is another example usage with component_annotation = None:

@component
def MyTrainerComponent(
    training_data: InputArtifact[standard_artifacts.Examples],
    model: OutputArtifact[standard_artifacts.Model],
    dropout_hyperparameter: float,
    num_iterations: Parameter[int] = 10
    ) -> OutputDict(loss=float, accuracy=float):
  '''My simple trainer component.'''

  records = read_examples(training_data.uri)
  model_obj = train_model(records, num_iterations, dropout_hyperparameter)
  model_obj.write_to(model.uri)

  return {
    'loss': model_obj.loss,
    'accuracy': model_obj.accuracy
  }

When the parameter use_beam is True, one of the parameters of the decorated function type-annotated by BeamComponentParameter[beam.Pipeline] and the default value can only be None. It will be replaced by a beam Pipeline made with the tfx pipeline's beam_pipeline_args that's shared with other beam-based components:

@component(use_beam=True)
def DataProcessingComponent(
    input_examples: InputArtifact[standard_artifacts.Examples],
    output_examples: OutputArtifact[standard_artifacts.Examples],
    beam_pipeline: BeamComponentParameter[beam.Pipeline] = None,
    ) -> None:
  '''My simple trainer component.'''

  records = read_examples(training_data.uri)
  with beam_pipeline as p:
    ...

Experimental: no backwards compatibility guarantees.

func Typehint-annotated component executor function.
component_annotation used to annotate the python function-based component. It is a subclass of SystemExecution from third_party/py/tfx/types/system_executions.py; it can be None.
use_beam Whether to create a component that is a subclass of BaseBeamComponent. This allows a beam.Pipeline to be made with tfx-pipeline-wise beam_pipeline_args.

base_component.BaseComponent or base_component.BaseBeamComponent subclass for the given component executor function.

EnvironmentError if the current Python interpreter is not Python 3.