tft.ngrams

Create a SparseTensor of n-grams.

tft.ngrams(
    tokens: tf.SparseTensor,
    ngram_range: Tuple[int, int],
    separator: str,
    name: Optional[str] = None
) -> tf.SparseTensor

Given a SparseTensor of tokens, returns a SparseTensor containing the ngrams that can be constructed from each row.

separator is inserted between each pair of tokens, so " " would be an appropriate choice if the tokens are words, while "" would be an appropriate choice if they are characters.

Example:

tokens = tf.SparseTensor(
        indices=[[0, 0], [0, 1], [0, 2], [1, 0], [1, 1], [1, 2], [1, 3]],
        values=['One', 'was', 'Johnny', 'Two', 'was', 'a', 'rat'],
        dense_shape=[2, 4])
print(tft.ngrams(tokens, ngram_range=(1, 3), separator=' '))
SparseTensor(indices=tf.Tensor(
    [[0 0] [0 1] [0 2] [0 3] [0 4] [0 5]
     [1 0] [1 1] [1 2] [1 3] [1 4] [1 5] [1 6] [1 7] [1 8]],
     shape=(15, 2), dtype=int64),
  values=tf.Tensor(
    [b'One' b'One was' b'One was Johnny' b'was' b'was Johnny' b'Johnny' b'Two'
     b'Two was' b'Two was a' b'was' b'was a' b'was a rat' b'a' b'a rat'
     b'rat'], shape=(15,), dtype=string),
  dense_shape=tf.Tensor([2 9], shape=(2,), dtype=int64))

Args
`tokens`	a two-dimensional`SparseTensor` of dtype `tf.string` containing tokens that will be used to construct ngrams.
`ngram_range`	A pair with the range (inclusive) of ngram sizes to return.
`separator`	a string that will be inserted between tokens when ngrams are constructed.
`name`	(Optional) A name for this operation.

Returns
A `SparseTensor` containing all ngrams from each row of the input. Note: if an ngram appears multiple times in the input row, it will be present the same number of times in the output. For unique ngrams, see tft.bag_of_words.

Raises
`ValueError`	if `tokens` is not 2D.
`ValueError`	if ngram_range[0] < 1 or ngram_range[1] < ngram_range[0]

tft.ngrams Stay organized with collections Save and categorize content based on your preferences.

Example:

Args

Returns

Raises

tft.ngrams