tff.analytics.heavy_hitters.iblt.IbltEncoder

Encodes the strings into an IBLT data structure.

tff.analytics.heavy_hitters.iblt.IbltEncoder(
    capacity,
    string_max_bytes,
    *,
    encoding: tff.analytics.heavy_hitters.iblt.CharacterEncoding = tff.analytics.heavy_hitters.iblt.CharacterEncoding.UTF8,
    drop_strings_above_max_length=False,
    seed=0,
    repetitions=DEFAULT_REPETITIONS,
    hash_family=None,
    hash_family_params=None,
    field_size=DEFAULT_FIELD_SIZE
)

The IBLT is a numpy array of shape [repetitions, table_size, num_chunks+2]. Its value at index (r, h, c) corresponds to (r is a repetition): sum of chunk c of keys hashing to h in r if c < num_chunks, sum of counts of keys hashing to h in r if c = num_chunks, sum of checks of keys hashing to h in r if c = num_chunks + 1.

Args
`capacity`	Number of distinct strings that we expect to be inserted.
`string_max_bytes`	Maximum length of a string in bytesthat can be inserted.
`encoding`	The character encoding of the string data to encode. For non-character binary data or strings with unknown encoding, specify `CharacterEncoding.UNKNOWN`. Defaults to `CharacterEncoding.UTF8`.
`drop_strings_above_max_length`	If True, strings above `string_max_bytes` will be dropped when constructing the IBLT. Defaults to False.
`seed`	Integer seed for hash functions. Defaults to 0.
`repetitions`	Number of repetitions in IBLT data structure (must be >= 3). Defaults to 3.
`hash_family`	String specifying the hash family to use to construct IBLT. (options include coupled or random, default is chosen based on capacity)
`hash_family_params`	A dict of parameters that the hash family hasher expects. (defaults are chosen based on capacity.)
`field_size`	The field size for all values in IBLT. Defaults to 2**31 - 1.

Methods

`compute_chunks`

View source

compute_chunks(
    input_strings
)

Returns Tensor containing integer chunks for input strings.

Args
`input_strings`	A tensor of strings.

Returns
A 2D tensor with rows consisting of integer chunks corresponding to the string indexed by the row and a trimmed `input_strings` that can fit in the IBLT.

`compute_iblt`

View source

@tf.function
compute_iblt(
    input_strings, input_counts=None
)

Returns Tensor containing the values of the IBLT data structure.

Args
`input_strings`	A 1D tensor of strings.
`input_counts`	A 1D tensor of tf.int64 representing the count of each string.

Returns
A tensor of shape [repetitions, table_size, num_chunks+2] whose value at index (r, h, c) corresponds to chunk c of the keys if c < num_chunks, to the counts if c = num_chunks, and to the checks if c = num_chunks + 1.

tff.analytics.heavy_hitters.iblt.IbltEncoder Stay organized with collections Save and categorize content based on your preferences.

Args

Methods

compute_chunks

compute_iblt

tff.analytics.heavy_hitters.iblt.IbltEncoder

`compute_chunks`

`compute_iblt`