View source on GitHub |
Decodes the strings and counts stored in an IBLT data structure.
tff.analytics.heavy_hitters.iblt.IbltDecoder(
iblt: tf.Tensor,
capacity: int,
string_max_bytes: int,
*,
encoding: tff.analytics.heavy_hitters.iblt.CharacterEncoding
= tff.analytics.heavy_hitters.iblt.CharacterEncoding.UTF8
,
seed: int = 0,
repetitions: int = DEFAULT_REPETITIONS,
hash_family: Optional[str] = None,
hash_family_params: Optional[dict[str, Union[int, float]]] = None,
field_size: int = DEFAULT_FIELD_SIZE
)
Args | |
---|---|
iblt
|
Tensor representing the IBLT computed by the IbltEncoder. |
capacity
|
Number of distinct strings that we expect to be inserted. |
string_max_bytes
|
Maximum length of a string in bytes that can be inserted. |
encoding
|
The character encoding of the string data to decode. For
non-character binary data or strings with unknown encoding, specify
CharacterEncoding.UNKNOWN . Defaults to CharacterEncoding.UTF8 .
|
seed
|
Integer seed for hash functions. Defaults to 0. |
repetitions
|
Number of repetitions in IBLT data structure (must be >= 3). Defaults to 3. |
hash_family
|
A str specifying the hash family to use to construct IBLT.
Options include coupled or random, default is chosen based on capacity.
|
hash_family_params
|
An optional dict of parameters that the hash family
hasher expects. Defaults are chosen based on capacity.
|
field_size
|
The field size for all values in IBLT. Defaults to 2**31 - 1. |
Methods
decode_string_from_chunks
decode_string_from_chunks(
chunks
)
Computes string from sequence of ints each encoding 'chunk_length' bytes.
Inverse of IBLTEncoder.compute_iblt
.
Args | |
---|---|
chunks
|
A tf.Tensor of num_chunks integers.
|
Returns | |
---|---|
A tf.Tensor with the string encoded in the chunks.
|
get_freq_estimates
get_freq_estimates()
Decodes key-value pairs from an IBLT.
Note that this method only works for UTF-8 strings, and when running TF in Eager mode.
Returns | |
---|---|
A dictionary containing a decoded key with its frequency. |
get_freq_estimates_tf
@tf.function
get_freq_estimates_tf() -> tuple[tf.Tensor, tf.Tensor, tf.Tensor]
Decodes key-value pairs from an IBLT.
Returns | |
---|---|
(out_strings, out_counts, num_not_decoded) where out_strings is tf.Tensor containing all the decoded strings, out_counts is a tf.Tensor containing the counts of each string and num_not_decoded is tf.Tensor with the number of items not decoded in the IBLT. |