ML Community Day is November 9! Join us for updates from TensorFlow, JAX, and more Learn more

tff.analytics.heavy_hitters.iblt.IbltDecoder

Decode the strings and counts stored in an IBLT data structure.

iblt Tensor representing the IBLT computed by the IbltEncoder.
capacity Number of distinct strings that we expect to be inserted.
string_max_length Maximum length of a string that can be inserted.
seed Integer seed for hash functions. Defaults to 0.
repetitions Number of repetitions in IBLT data structure (must be >= 3). Defaults to 3.
hash_family A str specifying the hash family to use to construct IBLT. Options include coupled or random, default is chosen based on capacity.
hash_family_params An optional dict of parameters that the hash family hasher expects. Defaults are chosen based on capacity.
dtype a tensorflow data type which determines the type of the IBLT values
field_size The field size for all values in IBLT. Defaults to 2**31 - 1.

Methods

decode

View source

Try to recover string and count from IBLT location (repetition, index).

Args
iblt the IBLT data structure
repetition repetition number ("hash table number")
index position in table

Returns
(data_string, count, chunk_encoding) where data_string is the decoded string, count is its corresponding count and chunk_encoding is the chunks that represent the encoding of the data_string. If no string is decoded, data_string is set to '' and the rest is set to -1.

decode_and_remove

View source

decode_string_from_chunks

View source

Compute string from sequence of ints each encoding 'chunk_length' bytes.

Inverse of IBLTEncoder.compute_iblt.

Args
chunks A tf.Tensor of num_chunks integers.

Returns
A tf.Tensor with the UTF-8 string encoded in the chunks.

get_freq_estimates

View source

Decode key-value pairs from an IBLT.

Note that this method only works when running TF in Eager mode.

Returns
A dictionary containing a decoded key with its frequency.

get_freq_estimates_tf

View source

Decode key-value pairs from an IBLT.

Returns
(out_strings, out_counts, num_not_decoded) where out_strings is tf.Tensor containing all the decoded strings, out_counts is a tf.Tensor containing the counts of each string and num_not_decoded is tf.Tensor with the number of items not decoded in the IBLT.

get_hash_check

View source

Returns a tf.Tensor containing hash_checks.

Args
input_strings A tf.Tensor of strings.

Returns
A tensor of shape (input_length, repetitions) containing hash_check[i] at index (i, r).

get_hash_indices

View source

is_peelable

View source

Test if can recover string and count from location (repetition, index).

Args
iblt The IBLT data structure.
repetition Repetition number ("hash table number").
index Position in table.

Returns
True if we can recover string and count from location (repetition, index), False otherwise.

remove_element

View source

Remove the key data_string and its count from the IBLT.

Args
iblt the IBLT data structure
data_string string to be removed from the IBLT
hash_indices must equal get_hash_indices(data_string), passed to avoid recomputation.
chunks must satisfy data_string = decode_string_from_chunks(chunks), passed to avoid recomputation.
count count of data_string in the IBLT.

Returns
The IBLT data structure with the (data_string, count) removed at hash_indices