tff.analytics.data_processing.get_top_elements_with_counts
Gets top unique elements from the input dataset
.
@tf.function
tff.analytics.data_processing.get_top_elements_with_counts(
dataset: tf.data.Dataset,
max_user_contribution: int,
string_max_bytes: Optional[int] = None
) -> tuple[tf.Tensor, tf.Tensor]
This method returns a tuple of elements
and counts
, where elements
are
the most common unique elements in the dataset, and counts is the number of
times each one appears. The input dataset
must yield batched rank-1 tensors.
This function reads each coordinate of the tensor as an individual element and
caps the total number of elements to return. Note that the returned set of top
elements will not necessarily be sorted.
Args |
dataset
|
A tf.data.Dataset to extract top elements from. Element type must
be tf.string .
|
max_user_contribution
|
The maximum number of elements to keep.
|
string_max_bytes
|
The maximum length (in bytes) of strings in the dataset.
Strings longer than string_max_bytes will be truncated. Defaults to
None , which means there is no limit of the string length.
|
Returns |
elements
|
A rank-1 Tensor containing the top max_user_contribution unique
elements of the input dataset . If the total number of unique elements is
less than or equal to max_user_contribution , returns the list of all
unique elements.
|
counts
|
A rank-1 Tensor containing the counts for each of the elements in
elements .
|
Raises |
ValueError
|
-- If the shape of elements in dataset is not rank 1.
-- If max_user_contribution is less than 1.
-- If string_max_bytes is not None and is less than 1.
|
TypeError
|
If dataset.element_spec.dtype must be tf.string is not
tf.string .
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-09-20 UTC.
[null,null,["Last updated 2024-09-20 UTC."],[],[]]