tff.analytics.heavy_hitters.iblt.IbltEncoder
Stay organized with collections
Save and categorize content based on your preferences.
Encodes the strings into an IBLT data structure.
tff.analytics.heavy_hitters.iblt.IbltEncoder(
capacity,
string_max_bytes,
*,
encoding: tff.analytics.heavy_hitters.iblt.CharacterEncoding
= tff.analytics.heavy_hitters.iblt.CharacterEncoding.UTF8
,
drop_strings_above_max_length=False,
seed=0,
repetitions=DEFAULT_REPETITIONS,
hash_family=None,
hash_family_params=None,
field_size=DEFAULT_FIELD_SIZE
)
The IBLT is a numpy array of shape [repetitions, table_size, num_chunks+2].
Its value at index (r, h, c)
corresponds to (r
is a repetition):
sum of chunk c
of keys hashing to h
in r
if c < num_chunks
,
sum of counts of keys hashing to h
in r
if c = num_chunks
,
sum of checks of keys hashing to h
in r
if c = num_chunks + 1
.
Args |
capacity
|
Number of distinct strings that we expect to be inserted.
|
string_max_bytes
|
Maximum length of a string in bytesthat can be inserted.
|
encoding
|
The character encoding of the string data to encode. For
non-character binary data or strings with unknown encoding, specify
CharacterEncoding.UNKNOWN . Defaults to CharacterEncoding.UTF8 .
|
drop_strings_above_max_length
|
If True, strings above string_max_bytes
will be dropped when constructing the IBLT. Defaults to False.
|
seed
|
Integer seed for hash functions. Defaults to 0.
|
repetitions
|
Number of repetitions in IBLT data structure (must be >= 3).
Defaults to 3.
|
hash_family
|
String specifying the hash family to use to construct IBLT.
(options include coupled or random, default is chosen based on capacity)
|
hash_family_params
|
A dict of parameters that the hash family hasher
expects. (defaults are chosen based on capacity.)
|
field_size
|
The field size for all values in IBLT. Defaults to 2**31 - 1.
|
Methods
compute_chunks
View source
compute_chunks(
input_strings
)
Returns Tensor containing integer chunks for input strings.
Args |
input_strings
|
A tensor of strings.
|
Returns |
A 2D tensor with rows consisting of integer chunks corresponding to the
string indexed by the row and a trimmed input_strings that can fit in
the IBLT.
|
compute_iblt
View source
@tf.function
compute_iblt(
input_strings, input_counts=None
)
Returns Tensor containing the values of the IBLT data structure.
Args |
input_strings
|
A 1D tensor of strings.
|
input_counts
|
A 1D tensor of tf.int64 representing the count of each
string.
|
Returns |
A tensor of shape [repetitions, table_size, num_chunks+2] whose value at
index (r, h, c) corresponds to chunk c of the keys if c < num_chunks, to
the counts if c = num_chunks, and to the checks if c = num_chunks + 1.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-09-20 UTC.
[null,null,["Last updated 2024-09-20 UTC."],[],[],null,["# tff.analytics.heavy_hitters.iblt.IbltEncoder\n\n\u003cbr /\u003e\n\n|-------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/federated/blob/v0.87.0 Version 2.0, January 2004 Licensed under the Apache License, Version 2.0 (the) |\n\nEncodes the strings into an IBLT data structure. \n\n tff.analytics.heavy_hitters.iblt.IbltEncoder(\n capacity,\n string_max_bytes,\n *,\n encoding: ../../../../tff/analytics/heavy_hitters/iblt/CharacterEncoding = ../../../../tff/analytics/heavy_hitters/iblt/CharacterEncoding#UTF8,\n drop_strings_above_max_length=False,\n seed=0,\n repetitions=DEFAULT_REPETITIONS,\n hash_family=None,\n hash_family_params=None,\n field_size=DEFAULT_FIELD_SIZE\n )\n\nThe IBLT is a numpy array of shape \\[repetitions, table_size, num_chunks+2\\].\nIts value at index `(r, h, c)` corresponds to (`r` is a repetition):\nsum of chunk `c` of keys hashing to `h` in `r` if `c \u003c num_chunks`,\nsum of counts of keys hashing to `h` in `r` if `c = num_chunks`,\nsum of checks of keys hashing to `h` in `r` if `c = num_chunks + 1`.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|---------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `capacity` | Number of distinct strings that we expect to be inserted. |\n| `string_max_bytes` | Maximum length of a string in bytesthat can be inserted. |\n| `encoding` | The character encoding of the string data to encode. For non-character binary data or strings with unknown encoding, specify [`CharacterEncoding.UNKNOWN`](../../../../tff/analytics/heavy_hitters/iblt/CharacterEncoding#UNKNOWN). Defaults to [`CharacterEncoding.UTF8`](../../../../tff/analytics/heavy_hitters/iblt/CharacterEncoding#UTF8). |\n| `drop_strings_above_max_length` | If True, strings above `string_max_bytes` will be dropped when constructing the IBLT. Defaults to False. |\n| `seed` | Integer seed for hash functions. Defaults to 0. |\n| `repetitions` | Number of repetitions in IBLT data structure (must be \\\u003e= 3). Defaults to 3. |\n| `hash_family` | String specifying the hash family to use to construct IBLT. (options include coupled or random, default is chosen based on capacity) |\n| `hash_family_params` | A dict of parameters that the hash family hasher expects. (defaults are chosen based on capacity.) |\n| `field_size` | The field size for all values in IBLT. Defaults to 2\\*\\*31 - 1. |\n\n\u003cbr /\u003e\n\nMethods\n-------\n\n### `compute_chunks`\n\n[View source](https://github.com/tensorflow/federated/blob/v0.87.0\nVersion 2.0, January 2004\nLicensed under the Apache License, Version 2.0 (the) \n\n compute_chunks(\n input_strings\n )\n\nReturns Tensor containing integer chunks for input strings.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ||\n|-----------------|----------------------|\n| `input_strings` | A tensor of strings. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ||\n|---|---|\n| A 2D tensor with rows consisting of integer chunks corresponding to the string indexed by the row and a trimmed `input_strings` that can fit in the IBLT. ||\n\n\u003cbr /\u003e\n\n### `compute_iblt`\n\n[View source](https://github.com/tensorflow/federated/blob/v0.87.0\nVersion 2.0, January 2004\nLicensed under the Apache License, Version 2.0 (the) \n\n @tf.function\n compute_iblt(\n input_strings, input_counts=None\n )\n\nReturns Tensor containing the values of the IBLT data structure.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ||\n|-----------------|----------------------------------------------------------------|\n| `input_strings` | A 1D tensor of strings. |\n| `input_counts` | A 1D tensor of tf.int64 representing the count of each string. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ||\n|---|---|\n| A tensor of shape \\[repetitions, table_size, num_chunks+2\\] whose value at index (r, h, c) corresponds to chunk c of the keys if c \\\u003c num_chunks, to the counts if c = num_chunks, and to the checks if c = num_chunks + 1. ||\n\n\u003cbr /\u003e"]]