tfx.components.example_gen.import_example_gen.executor.Executor

Generic TFX import example gen executor.

Inherits From: BaseExampleGenExecutor

Child Classes

class Context

Methods

Do

View source

Take input data source and generates serialized data splits.

The output is intended to be serialized tf.train.Examples or tf.train.SequenceExamples protocol buffer in gzipped TFRecord format, but subclasses can choose to override to write to any serialized records payload into gzipped TFRecord as specified, so long as downstream component can consume it. The format of payload is added to payload_format custom property of the output Example artifact.

Args
input_dict Input dict from input key to a list of Artifacts. Depends on detailed example gen implementation.
output_dict Output dict from output key to a list of Artifacts.

  • examples: splits of serialized records.
exec_properties A dict of execution properties. Depends on detailed example gen implementation.
  • input_base: an external directory containing the data files.
  • input_config: JSON string of example_gen_pb2.Input instance, providing input configuration.
  • output_config: JSON string of example_gen_pb2.Output instance, providing output configuration.
  • output_data_format: Payload format of generated data in output artifact, one of example_gen_pb2.PayloadFormat enum.
  • Returns
    None

    GenerateExamplesByBeam

    View source

    Converts input source to serialized record splits based on configs.

    Custom ExampleGen executor should provide GetInputSourceToExamplePTransform for converting input split to serialized records. Overriding this 'GenerateExamplesByBeam' method instead if complex logic is need, e.g., custom spliting logic.

    Args
    pipeline Beam pipeline.
    exec_properties A dict of execution properties. Depends on detailed example gen implementation.

    • input_base: an external directory containing the data files.
    • input_config: JSON string of example_gen_pb2.Input instance, providing input configuration.
    • output_config: JSON string of example_gen_pb2.Output instance, providing output configuration.
    • output_data_format: Payload format of generated data in output artifact, one of example_gen_pb2.PayloadFormat enum.

    Returns
    Dict of beam PCollection with split name as key, each PCollection is a single output split that contains serialized records.

    GetInputSourceToExamplePTransform

    View source

    Returns PTransform for importing records.