Updates input schema to conform to the input statistics.
tfdv.update_schema(
schema: schema_pb2.Schema,
statistics: statistics_pb2.DatasetFeatureStatisticsList,
infer_feature_shape: Optional[bool] = True,
max_string_domain_size: Optional[int] = 100
) -> schema_pb2.Schema
Args |
schema
|
A Schema protocol buffer.
|
statistics
|
A DatasetFeatureStatisticsList protocol buffer. Schema inference
is currently supported only for lists with a single
DatasetFeatureStatistics proto or lists with multiple
DatasetFeatureStatistics protos corresponding to data slices that include
the default slice (i.e., the slice with all examples). If a list with
multiple DatasetFeatureStatistics protos is used, this function will
update the schema to conform to the statistics corresponding to the
default slice.
|
infer_feature_shape
|
DEPRECATED, do not use. If a feature specifies
a shape, the shape will always be validated. If the feature does not
specify a shape, this function will not try inferring a shape from the
given statistics.
|
max_string_domain_size
|
Maximum size of the domain of a string feature in
order to be interpreted as a categorical feature.
|
Returns |
A Schema protocol buffer.
|
Raises |
TypeError
|
If the input argument is not of the expected type.
|
ValueError
|
If the input statistics proto contains multiple datasets, none
of which corresponds to the default slice.
|