The ExampleValidator TFX Pipeline Component
Stay organized with collections
Save and categorize content based on your preferences.
The ExampleValidator pipeline component identifies anomalies in training and serving
data. It can detect different classes of anomalies in the data. For example it
can:
- perform validity checks by comparing data statistics against a schema that
codifies expectations of the user.
- detect training-serving skew by comparing training and serving
data.
- detect data drift by looking at a series of data.
- perform custom validations using a SQL-based configuration.
The ExampleValidator pipeline component identifies any anomalies in the example data
by comparing data statistics computed by the StatisticsGen pipeline component against a
schema. The inferred schema codifies properties which the input data is expected to
satisfy, and can be modified by the developer.
- Consumes: A schema from a SchemaGen component, and statistics from a StatisticsGen
component.
- Emits: Validation results
ExampleValidator and TensorFlow Data Validation
ExampleValidator makes extensive use of TensorFlow Data Validation
for validating your input data.
Using the ExampleValidator Component
An ExampleValidator pipeline component is typically very easy to deploy and
requires little customization. Typical code looks like this:
validate_stats = ExampleValidator(
statistics=statistics_gen.outputs['statistics'],
schema=schema_gen.outputs['schema']
)
More details are available in the
ExampleValidator API reference.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-09-06 UTC.
[null,null,["Last updated 2024-09-06 UTC."],[],[],null,["# The ExampleValidator TFX Pipeline Component\n\n\u003cbr /\u003e\n\nThe ExampleValidator pipeline component identifies anomalies in training and serving\ndata. It can detect different classes of anomalies in the data. For example it\ncan:\n\n1. perform validity checks by comparing data statistics against a schema that codifies expectations of the user.\n2. detect training-serving skew by comparing training and serving data.\n3. detect data drift by looking at a series of data.\n4. perform [custom validations](https://github.com/tensorflow/data-validation/blob/master/g3doc/custom_data_validation.md) using a SQL-based configuration.\n\nThe ExampleValidator pipeline component identifies any anomalies in the example data\nby comparing data statistics computed by the StatisticsGen pipeline component against a\nschema. The inferred schema codifies properties which the input data is expected to\nsatisfy, and can be modified by the developer.\n\n- Consumes: A schema from a SchemaGen component, and statistics from a StatisticsGen component.\n- Emits: Validation results\n\nExampleValidator and TensorFlow Data Validation\n-----------------------------------------------\n\nExampleValidator makes extensive use of [TensorFlow Data Validation](/tfx/guide/tfdv)\nfor validating your input data.\n\nUsing the ExampleValidator Component\n------------------------------------\n\nAn ExampleValidator pipeline component is typically very easy to deploy and\nrequires little customization. Typical code looks like this: \n\n validate_stats = ExampleValidator(\n statistics=statistics_gen.outputs['statistics'],\n schema=schema_gen.outputs['schema']\n )\n\nMore details are available in the\n[ExampleValidator API reference](https://www.tensorflow.org/tfx/api_docs/python/tfx/v1/components/ExampleValidator)."]]