tfx.orchestration.kubeflow.v2.components.experimental.ai_platform_training_component.create_ai_platform_training

Creates a pipeline step that launches a AIP training job.

The generated TFX component will have a component spec specified dynamically, through inputs/outputs/parameters in the following format:

  • inputs: A mapping from input name to the upstream channel connected. The artifact type of the channel will be automatically inferred.
  • outputs: A mapping from output name to the associated artifact type.
  • parameters: A mapping from execution property names to its associated value. Only primitive typed values are supported. Note that RuntimeParameter is not supported yet.

For example:

create_ai_platform_training(
  ...
  inputs: {
      # Assuming there is an upstream node example_gen, with an output
      # 'examples' of the type Examples.
      'examples': example_gen.outputs['examples'],
  },
  outputs: {
      'model': standard_artifacts.Model,
  },
  parameters: {
      'n_steps': 100,
      'optimizer': 'sgd',
  }
  ...
)

will generate a component instance with a component spec equivalent to:

class MyComponentSpec(ComponentSpec):
  INPUTS = {
      'examples': ChannelParameter(type=standard_artifacts.Examples)
  }
  OUTPUTS = {
      'model': ChannelParameter(type=standard_artifacts.Model)
  }
  PARAMETERS = {
      'n_steps': ExecutionParameter(type=int),
      'optimizer': ExecutionParameter(type=str)
  }

with its input 'examples' is connected to the example_gen output, and execution properties specified as 100 and 'sgd' respectively.

Example usage of the component:

# A single node training job.
my_train = create_ai_platform_training(
    name='my_training_step',
    project_id='my-project',
    region='us-central1',
    image_uri='gcr.io/my-project/caip-training-test:latest',
    'args': [
        '--examples',
        placeholders.InputUriPlaceholder('examples'),
        '--n-steps',
        placeholders.InputValuePlaceholder('n_step'),
        '--output-location',
        placeholders.OutputUriPlaceholder('model')
    ]
    scale_tier='BASIC_GPU',
    inputs={'examples': example_gen.outputs['examples']},
    outputs={
        'model': standard_artifacts.Model
    },
    parameters={'n_step': 100}
)

# More complex setting can be expressed by providing training_input
# directly.
my_distributed_train = create_ai_platform_training(
    name='my_training_step',
    project_id='my-project',
    training_input={
        'scaleTier':
            'CUSTOM',
        'region':
            'us-central1',
        'masterType': 'n1-standard-8',
        'masterConfig': {
            'imageUri': 'gcr.io/my-project/my-dist-training:latest'
        },
        'workerType': 'n1-standard-8',
        'workerCount': 8,
        'workerConfig': {
            'imageUri': 'gcr.io/my-project/my-dist-training:latest'
        },
        'args': [
            '--examples',
            placeholders.InputUriPlaceholder('examples'),
            '--n-steps',
            placeholders.InputValuePlaceholder('n_step'),
            '--output-location',
            placeholders.OutputUriPlaceholder('model')
        ]
    },
    inputs={'examples': example_gen.outputs['examples']},
    outputs={'model': Model},
    parameters={'n_step': 100}
)

name name of the component. This is needed to construct the component spec and component class dynamically as well.
project_id the GCP project under which the AIP training job will be running.
region GCE region where the AIP training job will be running.
job_id the unique ID of the job. Default to 'tfx_%Y%m%d%H%M%S'
image_uri the GCR location of the container image, which will be used to execute the training program. If the same field is specified in training_input, the latter overrides image_uri.
args command line arguments that will be passed into the training program. Users can use placeholder semantics as in tfx.dsl.component.experimental.container_component to wire the args with component inputs/outputs/parameters.
scale_tier Cloud ML resource requested by the job. See https://cloud.google.com/ai-platform/training/docs/reference/rest/v1/projects.jobs#ScaleTier
training_input full training job spec. This field overrides other specifications if applicable. This field follows the TrainingInput schema.
labels user-specified label attached to the job.
inputs the dict of component inputs.
outputs the dict of component outputs.
parameters the dict of component parameters, aka, execution properties.

A component instance that represents the AIP job in the DSL.

ValueError when image_uri is missing and masterConfig is not specified in training_input, or when region is missing and training_input does not provide region either.
TypeError when non-primitive parameters are specified.