The InfraValidator TFX Pipeline Component

InfraValidator is a TFX component that is used as an early warning layer before pushing a model into production. The name "infra" validator came from the fact that it is validating the model in the actual model serving "infrastructure". If Evaluator is to guarantee the performance of the model, InfraValidator is to guarantee the model is mechanically fine and prevents bad models from being pushed.

How does it work?

InfraValidator takes the model, launches a sand-boxed model server with the model, and sees if it can be successfully loaded and optionally queried. The infra validation result will be generated in the blessing output in the same way as Evaluator does.

InfraValidator focuses on the compatibility between the model server binary (e.g. TensorFlow Serving) and the model to deploy. Despite the name "infra" validator, it is the user's responsibility to configure the environment correctly, and infra validator only interacts with the model server in the user-configured environment to see if it works fine. Configuring this environment correctly will ensure that infra validation passing or failing will be indicative of whether the model would be servable in the production serving environment. This implies some of, but is not limited to, the following:

InfraValidator is using the same model server binary as will be used in production. This is the minimal level to which the infra validation environment must converge.
InfraValidator is using the same resources (e.g. allocation quantity and type of CPU, memory, and accelerators) as will be used in production.
InfraValidator is using the same model server configuration as will be used in production.

Depending on the situation, users can choose to what degree InfraValidator should be identical to the production environment. Technically, a model can be infra validated in a local Docker environment and then served in a completely different environment (e.g. Kubernetes cluster) without a problem. However, InfraValidator will not have checked for this divergence.

Operation mode

Depending on the configuration, infra validation is done in one of the following modes:

LOAD_ONLY mode: checking whether the model was successfully loaded in the serving infrastructure or not. OR
LOAD_AND_QUERY mode: LOAD_ONLY mode plus sending some sample requests to check if model is capable of serving inferences. InfraValidator does not care the prediction was correct or not. Only whether the request was successful or not matters.

How do I use it?

Usually InfraValidator is defined next to an Evaluator component, and its output is fed to a Pusher. If InfraValidator fails, the model will not be pushed.

evaluator = Evaluator(
    model=trainer.outputs['model'],
    examples=example_gen.outputs['examples'],
    baseline_model=model_resolver.outputs['model'],
    eval_config=tfx.proto.EvalConfig(...)
)

infra_validator = InfraValidator(
    model=trainer.outputs['model'],
    serving_spec=tfx.proto.ServingSpec(...)
)

pusher = Pusher(
    model=trainer.outputs['model'],
    model_blessing=evaluator.outputs['blessing'],
    infra_blessing=infra_validator.outputs['blessing'],
    push_destination=tfx.proto.PushDestination(...)
)

Configuring an InfraValidator component.

There are three kinds of protos to configure InfraValidator.

`ServingSpec`

ServingSpec is the most crucial configuration for the InfraValidator. It defines:

what type of model server to run
where to run it

For model server types (called serving binary) we support

TensorFlow Serving

Following serving platforms are currently supported:

Local Docker (Docker should be installed in advance)
Kubernetes (limited support for KubeflowDagRunner only)

The choice for serving binary and serving platform are made by specifying a oneof block of the ServingSpec. For example to use TensorFlow Serving binary running on the Kubernetes cluster, tensorflow_serving and kubernetes field should be set.

infra_validator=InfraValidator(
    model=trainer.outputs['model'],
    serving_spec=tfx.proto.ServingSpec(
        tensorflow_serving=tfx.proto.TensorFlowServing(
            tags=['latest']
        ),
        kubernetes=tfx.proto.KubernetesConfig()
    )
)

To further configure ServingSpec, please check out the protobuf definition.

`ValidationSpec`

Optional configuration to adjust the infra validation criteria or workflow.

infra_validator=InfraValidator(
    model=trainer.outputs['model'],
    serving_spec=tfx.proto.ServingSpec(...),
    validation_spec=tfx.proto.ValidationSpec(
        # How much time to wait for model to load before automatically making
        # validation fail.
        max_loading_time_seconds=60,
        # How many times to retry if infra validation fails.
        num_tries=3
    )
)

All ValidationSpec fields have a sound default value. Check more detail from the protobuf definition.

`RequestSpec`

Optional configuration to specify how to build sample requests when running infra validation in LOAD_AND_QUERY mode. In order to use LOAD_AND_QUERY mode, it is required to specify both request_spec execution properties as well as examples input channel in the component definition.

infra_validator = InfraValidator(
    model=trainer.outputs['model'],
    # This is the source for the data that will be used to build a request.
    examples=example_gen.outputs['examples'],
    serving_spec=tfx.proto.ServingSpec(
        # Depending on what kind of model server you're using, RequestSpec
        # should specify the compatible one.
        tensorflow_serving=tfx.proto.TensorFlowServing(tags=['latest']),
        local_docker=tfx.proto.LocalDockerConfig(),
    ),
    request_spec=tfx.proto.RequestSpec(
        # InfraValidator will look at how "classification" signature is defined
        # in the model, and automatically convert some samples from `examples`
        # artifact to prediction RPC requests.
        tensorflow_serving=tfx.proto.TensorFlowServingRequestSpec(
            signature_names=['classification']
        ),
        num_examples=10  # How many requests to make.
    )
)

Producing a SavedModel with warmup

(From version 0.30.0)

Since InfraValidator validates model with real requests, it can easily reuse these validation requests as warmup requests of a SavedModel. InfraValidator provides an option (RequestSpec.make_warmup) to export a SavedModel with warmup.

infra_validator = InfraValidator(
    ...,
    request_spec=tfx.proto.RequestSpec(..., make_warmup=True)
)

Then the output InfraBlessing artifact will contain a SavedModel with warmup, and can also be pushed by the Pusher, just like Model artifact.

Limitations

Current InfraValidator is not complete yet, and has some limitations.

Only TensorFlow SavedModel model format can be validated.
When running TFX on Kubernetes, the pipeline should be executed by KubeflowDagRunner inside Kubeflow Pipelines. The model server will be launched in the same Kubernetes cluster and the namespace that Kubeflow is using.
InfraValidator is primarily focused on deployments to TensorFlow Serving, and while still useful it is less accurate for deployments to TensorFlow Lite and TensorFlow.js, or other inference frameworks.

There's a limited support on LOAD_AND_QUERY mode for the Predict method signature (which is the only exportable method in TensorFlow 2). InfraValidator requires the Predict signature to consume a serialized tf.Example as the only input.

@tf.function
def parse_and_run(serialized_example):
  features = tf.io.parse_example(serialized_example, FEATURES)
  return model(features)

model.save('path/to/save', signatures={
  # This exports "Predict" method signature under name "serving_default".
  'serving_default': parse_and_run.get_concrete_function(
      tf.TensorSpec(shape=[None], dtype=tf.string, name='examples'))
})

Check out an Penguin example sample code to see how this signature interacts with other components in TFX.