Encodes the strings into an IBLT data structure.
tff.analytics.heavy_hitters.iblt.IbltEncoder(
capacity,
string_max_bytes,
*,
encoding: tff.analytics.heavy_hitters.iblt.CharacterEncoding
= tff.analytics.heavy_hitters.iblt.CharacterEncoding.UTF8
,
drop_strings_above_max_length=False,
seed=0,
repetitions=DEFAULT_REPETITIONS,
hash_family=None,
hash_family_params=None,
field_size=DEFAULT_FIELD_SIZE
)
The IBLT is a numpy array of shape [repetitions, table_size, num_chunks+2].
Its value at index (r, h, c)
corresponds to (r
is a repetition):
sum of chunk c
of keys hashing to h
in r
if c < num_chunks
,
sum of counts of keys hashing to h
in r
if c = num_chunks
,
sum of checks of keys hashing to h
in r
if c = num_chunks + 1
.
Args |
capacity
|
Number of distinct strings that we expect to be inserted.
|
string_max_bytes
|
Maximum length of a string in bytesthat can be inserted.
|
encoding
|
The character encoding of the string data to encode. For
non-character binary data or strings with unknown encoding, specify
CharacterEncoding.UNKNOWN . Defaults to CharacterEncoding.UTF8 .
|
drop_strings_above_max_length
|
If True, strings above string_max_bytes
will be dropped when constructing the IBLT. Defaults to False.
|
seed
|
Integer seed for hash functions. Defaults to 0.
|
repetitions
|
Number of repetitions in IBLT data structure (must be >= 3).
Defaults to 3.
|
hash_family
|
String specifying the hash family to use to construct IBLT.
(options include coupled or random, default is chosen based on capacity)
|
hash_family_params
|
A dict of parameters that the hash family hasher
expects. (defaults are chosen based on capacity.)
|
field_size
|
The field size for all values in IBLT. Defaults to 2**31 - 1.
|
Methods
compute_chunks
View source
compute_chunks(
input_strings
)
Returns Tensor containing integer chunks for input strings.
Args |
input_strings
|
A tensor of strings.
|
Returns |
A 2D tensor with rows consisting of integer chunks corresponding to the
string indexed by the row and a trimmed input_strings that can fit in
the IBLT.
|
compute_iblt
View source
@tf.function
compute_iblt(
input_strings, input_counts=None
)
Returns Tensor containing the values of the IBLT data structure.
Args |
input_strings
|
A 1D tensor of strings.
|
input_counts
|
A 1D tensor of tf.int64 representing the count of each
string.
|
Returns |
A tensor of shape [repetitions, table_size, num_chunks+2] whose value at
index (r, h, c) corresponds to chunk c of the keys if c < num_chunks, to
the counts if c = num_chunks, and to the checks if c = num_chunks + 1.
|