Create a SparseTensor
of n-grams.
tft.ngrams(
tokens: tf.SparseTensor,
ngram_range: Tuple[int, int],
separator: str,
name: Optional[str] = None
) -> tf.SparseTensor
Given a SparseTensor
of tokens, returns a SparseTensor
containing the
ngrams that can be constructed from each row.
separator
is inserted between each pair of tokens, so " " would be an
appropriate choice if the tokens are words, while "" would be an appropriate
choice if they are characters.
Example:
tokens = tf.SparseTensor(
indices=[[0, 0], [0, 1], [0, 2], [1, 0], [1, 1], [1, 2], [1, 3]],
values=['One', 'was', 'Johnny', 'Two', 'was', 'a', 'rat'],
dense_shape=[2, 4])
print(tft.ngrams(tokens, ngram_range=(1, 3), separator=' '))
SparseTensor(indices=tf.Tensor(
[[0 0] [0 1] [0 2] [0 3] [0 4] [0 5]
[1 0] [1 1] [1 2] [1 3] [1 4] [1 5] [1 6] [1 7] [1 8]],
shape=(15, 2), dtype=int64),
values=tf.Tensor(
[b'One' b'One was' b'One was Johnny' b'was' b'was Johnny' b'Johnny' b'Two'
b'Two was' b'Two was a' b'was' b'was a' b'was a rat' b'a' b'a rat'
b'rat'], shape=(15,), dtype=string),
dense_shape=tf.Tensor([2 9], shape=(2,), dtype=int64))
Args |
tokens
|
a two-dimensionalSparseTensor of dtype tf.string containing
tokens that will be used to construct ngrams.
|
ngram_range
|
A pair with the range (inclusive) of ngram sizes to return.
|
separator
|
a string that will be inserted between tokens when ngrams are
constructed.
|
name
|
(Optional) A name for this operation.
|
Returns |
A SparseTensor containing all ngrams from each row of the input. Note:
if an ngram appears multiple times in the input row, it will be present the
same number of times in the output. For unique ngrams, see tft.bag_of_words.
|
Raises |
ValueError
|
if tokens is not 2D.
|
ValueError
|
if ngram_range[0] < 1 or ngram_range[1] < ngram_range[0]
|