tft.bag_of_words
Stay organized with collections
Save and categorize content based on your preferences.
Computes a bag of "words" based on the specified ngram configuration.
tft.bag_of_words(
tokens: tf.SparseTensor,
ngram_range: Tuple[int, int],
separator: str,
name: Optional[str] = None
) -> tf.SparseTensor
A light wrapper around tft.ngrams. First computes ngrams, then transforms the
ngram representation (list semantics) into a Bag of Words (set semantics) per
row. Each row reflects the set of unique ngrams present in an input record.
See tft.ngrams for more information.
Args |
tokens
|
a two-dimensional SparseTensor of dtype tf.string containing
tokens that will be used to construct a bag of words.
|
ngram_range
|
A pair with the range (inclusive) of ngram sizes to compute.
|
separator
|
a string that will be inserted between tokens when ngrams are
constructed.
|
name
|
(Optional) A name for this operation.
|
Returns |
A SparseTensor containing the unique set of ngrams from each row of the
input. Note: the original order of the ngrams may not be preserved.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-11-01 UTC.
[null,null,["Last updated 2024-11-01 UTC."],[],[],null,["# tft.bag_of_words\n\n\u003cbr /\u003e\n\n|---------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/transform/blob/v1.16.0/tensorflow_transform/mappers.py#L1482-L1514) |\n\nComputes a bag of \"words\" based on the specified ngram configuration. \n\n tft.bag_of_words(\n tokens: tf.SparseTensor,\n ngram_range: Tuple[int, int],\n separator: str,\n name: Optional[str] = None\n ) -\u003e tf.SparseTensor\n\nA light wrapper around tft.ngrams. First computes ngrams, then transforms the\nngram representation (list semantics) into a Bag of Words (set semantics) per\nrow. Each row reflects the set of *unique* ngrams present in an input record.\n\nSee tft.ngrams for more information.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|---------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `tokens` | a two-dimensional `SparseTensor` of dtype [`tf.string`](https://www.tensorflow.org/api_docs/python/tf#string) containing tokens that will be used to construct a bag of words. |\n| `ngram_range` | A pair with the range (inclusive) of ngram sizes to compute. |\n| `separator` | a string that will be inserted between tokens when ngrams are constructed. |\n| `name` | (Optional) A name for this operation. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| A `SparseTensor` containing the unique set of ngrams from each row of the input. Note: the original order of the ngrams may not be preserved. ||\n\n\u003cbr /\u003e"]]