![]() |
A StatsGenerator which computes statistics using a combiner function.
tfdv.CombinerStatsGenerator(
name: Text,
schema: Optional[schema_pb2.Schema] = None
) -> None
This class computes statistics using a combiner function. It emits partial states processing a batch of examples at a time, merges the partial states, and finally computes the statistics from the merged partial state at the end.
This object mirrors a beam.CombineFn except for the add_input interface, which is expected to be defined by its sub-classes. Specifically, the generator must implement the following four methods:
Initializes an accumulator to store the partial state and returns it. create_accumulator()
Incorporates a batch of input examples (represented as an arrow RecordBatch) into the current accumulator and returns the updated accumulator. add_input(accumulator, input_record_batch)
Merge the partial states in the accumulators and returns the accumulator containing the merged state. merge_accumulators(accumulators)
Compute statistics from the partial state in the accumulator and return the result as a DatasetFeatureStatistics proto. extract_output(accumulator)
Args | |
---|---|
name
|
A unique name associated with the statistics generator. |
schema
|
An optional schema for the dataset. |
Attributes | |
---|---|
name
|
|
schema
|
Methods
add_input
add_input(
accumulator: ACCTYPE,
input_record_batch: pa.RecordBatch
) -> ACCTYPE
Returns result of folding a batch of inputs into accumulator.
Args | |
---|---|
accumulator
|
The current accumulator. |
input_record_batch
|
An Arrow RecordBatch whose columns are features and
rows are examples. The columns are of type List |
Returns | |
---|---|
The accumulator after updating the statistics for the batch of inputs. |
create_accumulator
create_accumulator() -> ACCTYPE
Returns a fresh, empty accumulator.
Returns | |
---|---|
An empty accumulator. |
extract_output
extract_output(
accumulator: ACCTYPE
) -> statistics_pb2.DatasetFeatureStatistics
Returns result of converting accumulator into the output value.
Args | |
---|---|
accumulator
|
The final accumulator value. |
Returns | |
---|---|
A proto representing the result of this stats generator. |
merge_accumulators
merge_accumulators(
accumulators: Iterable[ACCTYPE]
) -> ACCTYPE
Merges several accumulators to a single accumulator value.
Args | |
---|---|
accumulators
|
The accumulators to merge. |
Returns | |
---|---|
The merged accumulator. |