tfdv.DetectFeatureSkew

本页内容
Args
Class Variables

API for detecting feature skew between training and serving examples.

tfdv.DetectFeatureSkew(
    identifier_features: List[types.FeatureName],
    features_to_ignore: Optional[List[types.FeatureName]] = None,
    sample_size: int = 0,
    float_round_ndigits: Optional[int] = None,
    allow_duplicate_identifiers: bool = False
) -> None

Example:

  with beam.Pipeline(runner=...) as p:
     training_examples = p | 'ReadTrainingData' >>
       beam.io.ReadFromTFRecord(
          training_filepaths, coder=beam.coders.ProtoCoder(tf.train.Example))
     serving_examples = p | 'ReadServingData' >>
       beam.io.ReadFromTFRecord(
          serving_filepaths, coder=beam.coders.ProtoCoder(tf.train.Example))
     _ = ((training_examples, serving_examples) | 'DetectFeatureSkew' >>
       DetectFeatureSkew(identifier_features=['id1'], sample_size=5)
     | 'WriteFeatureSkewResultsOutput' >>
       tfdv.WriteFeatureSkewResultsToTFRecord(output_path)
     | 'WriteFeatureSkwePairsOutput' >>
     tfdv.WriteFeatureSkewPairsToTFRecord(output_path))

See the documentation for DetectFeatureSkewImpl for more detail about feature skew detection.

Args
`identifier_features`	Names of features to use as identifiers.
`features_to_ignore`	Names of features for which no feature skew detection is done.
`sample_size`	Size of the sample of training-serving example pairs that exhibit skew to include in the skew results.
`float_round_ndigits`	Number of digits precision after the decimal point to which to round float values before comparing them.
`allow_duplicate_identifiers`	If set, skew detection will be done on examples for which there are duplicate identifier feature values. In this case, the counts in the FeatureSkew result are based on each training-serving example pair analyzed. Examples with given identifier feature values must all fit in memory.

Class Variables
pipeline	`None`

tfdv.DetectFeatureSkew

Example:

Args

Class Variables