ML Community Day is November 9! Join us for updates from TensorFlow, JAX, and more Learn more

tff.analytics.heavy_hitters.iblt.IbltEncoder

Encodes the strings into an IBLT data structure.

The IBLT is a numpy array of shape [repetitions, table_size, num_chunks+2]. Its value at index (r, h, c) corresponds to: sum of chunk c of keys hashing to h in repetition r if c < num_chunks, sum of counts of keys hashing to h in repetition r if c = num_chunks, sum of checks of keys hashing to h in repetition r if c = num_chunks + 1.

capacity Number of distinct strings that we expect to be inserted.
string_max_length Maximum length of a string that can be inserted.
drop_strings_above_max_length If True, strings above string_max_length will be dropped when constructing the IBLT. Defaults to False.
seed Integer seed for hash functions. Defaults to 0.
repetitions Number of repetitions in IBLT data structure (must be >= 3). Defaults to 3.
hash_family String specifying the hash family to use to construct IBLT. (options include coupled or random, default is chosen based on capacity)
hash_family_params A dict of parameters that the hash family hasher expects. (defaults are chosen based on capacity.)
dtype A tensorflow data type which determines the type of the IBLT values.
field_size The field size for all values in IBLT. Defaults to 2**31 - 1.

Methods

compute_checks

View source

Returns SparseTensor with hash_check for each (input string, repetition).

Args
sparse_indices A tensor of shape (input_length, repetitions, 3).
hash_check A tensor of shape (input_length, repetitions).
input_length An integer.
input_counts A 1D tensor of self.dtype representing the count of each string.

Returns
A SparseTensor of dense_shape [input_length, repetitions, table_size, num_chunks+2] containing hash_check[i, r] for each index of the form (i, r, h, num_chunks+1) where 0 <= i < input_length, 0 <= r < repetitions, and h is the hash-position of the ith input string in repetition r.

compute_chunks

View source

Returns Tensor containing integer chunks for input strings.

Args
input_strings A tensor of strings.

Returns
A 2D tensor with rows consisting of integer chunks corresponding to the string indexed by the row and a trimmed input_strings that can fit in the IBLT.

compute_counts

View source

Returns SparseTensor with value 1 for each (input string, repetition).

Args
sparse_indices A tensor of shape (input_length, repetitions, 3).
input_length An integer.
input_counts A 1D tensor of self.dtype representing the count of each string.

Returns
A SparseTensor of dense_shape [input_length, repetitions, table_size, num_chunks+2] containing a count of 1 for each index of the form (i, r, h, num_chunks) where 0 <= i < input_length, 0 <= r < repetitions, and h is the hash-position of the ith input string in repetition r.

compute_hash_check

View source

Returns Tensor containing hash_check for each (input string, repetition).

Args
input_strings A tensor of strings.

Returns
A tensor of shape (input_length, repetitions) containing hash_check[i] at index (i, r).

compute_iblt

View source

Returns Tensor containing the values of the IBLT data structure.

Args
input_strings A 1D tensor of strings.
input_counts A 1D tensor of self.dtype representing the count of each string.

Returns
A tensor of shape [repetitions, table_size, num_chunks+2] whose value at index (r, h, c) corresponds to chunk c of the keys if c < num_chunks, to the counts if c = num_chunks, and to the checks if c = num_chunks + 1.

compute_keys

View source

Returns SparseTensor with key for each (input string, repetition, chunk).

Args
sparse_indices A tensor of shape (input_length, repetitions, 3).
chunks A tensor of shape (input_length, num_chunks).
input_length An integer.
input_counts A 1D tensor of self.dtype representing the count of each string.

Returns
A SparseTensor of dense_shape [input_length, repetitions, table_size, num_chunks+2] containing chunk[i, c] for each index of the form (i, r, h, c) where 0 <= i < input_length, 0 <= r < repetitions, 0 <= c < num_chunks, and h is the hash-position of the ith input string in repetition r.