tfdv.update_schema
Stay organized with collections
Save and categorize content based on your preferences.
Updates input schema to conform to the input statistics.
tfdv.update_schema(
schema: schema_pb2.Schema,
statistics: statistics_pb2.DatasetFeatureStatisticsList,
infer_feature_shape: Optional[bool] = True,
max_string_domain_size: Optional[int] = 100
) -> schema_pb2.Schema
Args |
schema
|
A Schema protocol buffer.
|
statistics
|
A DatasetFeatureStatisticsList protocol buffer. Schema inference
is currently supported only for lists with a single
DatasetFeatureStatistics proto or lists with multiple
DatasetFeatureStatistics protos corresponding to data slices that include
the default slice (i.e., the slice with all examples). If a list with
multiple DatasetFeatureStatistics protos is used, this function will
update the schema to conform to the statistics corresponding to the
default slice.
|
infer_feature_shape
|
DEPRECATED, do not use. If a feature specifies
a shape, the shape will always be validated. If the feature does not
specify a shape, this function will not try inferring a shape from the
given statistics.
|
max_string_domain_size
|
Maximum size of the domain of a string feature in
order to be interpreted as a categorical feature.
|
Returns |
A Schema protocol buffer.
|
Raises |
TypeError
|
If the input argument is not of the expected type.
|
ValueError
|
If the input statistics proto contains multiple datasets, none
of which corresponds to the default slice.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-10-18 UTC.
[null,null,["Last updated 2024-10-18 UTC."],[],[],null,["# tfdv.update_schema\n\n\u003cbr /\u003e\n\n|------------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/data-validation/blob/v1.16.1/tensorflow_data_validation/api/validation_api.py#L137-L192) |\n\nUpdates input schema to conform to the input statistics. \n\n tfdv.update_schema(\n schema: schema_pb2.Schema,\n statistics: statistics_pb2.DatasetFeatureStatisticsList,\n infer_feature_shape: Optional[bool] = True,\n max_string_domain_size: Optional[int] = 100\n ) -\u003e schema_pb2.Schema\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|--------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `schema` | A Schema protocol buffer. |\n| `statistics` | A DatasetFeatureStatisticsList protocol buffer. Schema inference is currently supported only for lists with a single DatasetFeatureStatistics proto or lists with multiple DatasetFeatureStatistics protos corresponding to data slices that include the default slice (i.e., the slice with all examples). If a list with multiple DatasetFeatureStatistics protos is used, this function will update the schema to conform to the statistics corresponding to the default slice. |\n| `infer_feature_shape` | DEPRECATED, do not use. If a feature specifies a shape, the shape will always be validated. If the feature does not specify a shape, this function will not try inferring a shape from the given statistics. |\n| `max_string_domain_size` | Maximum size of the domain of a string feature in order to be interpreted as a categorical feature. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| A Schema protocol buffer. ||\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ------ ||\n|--------------|-----------------------------------------------------------------------------------------------------------|\n| `TypeError` | If the input argument is not of the expected type. |\n| `ValueError` | If the input statistics proto contains multiple datasets, none of which corresponds to the default slice. |\n\n\u003cbr /\u003e"]]