|  View source on GitHub | 
Generates skip-gram token and label paired Tensors from the input tensor.
tfa.text.skip_gram_sample(
    input_tensor: tfa.types.TensorLike,
    min_skips: tfa.types.FloatTensorLike = 1,
    max_skips: tfa.types.FloatTensorLike = 5,
    start: tfa.types.FloatTensorLike = 0,
    limit: tfa.types.FloatTensorLike = -1,
    emit_self_as_target: bool = False,
    vocab_freq_table: tf.lookup.KeyValueTensorInitializer = None,
    vocab_min_count: Optional[FloatTensorLike] = None,
    vocab_subsampling: Optional[FloatTensorLike] = None,
    corpus_size: Optional[FloatTensorLike] = None,
    seed: Optional[FloatTensorLike] = None,
    name: Optional[str] = None
) -> tf.Tensor
Generates skip-gram ("token", "label") pairs using each element in the
rank-1 input_tensor as a token. The window size used for each token will
be randomly selected from the range specified by [min_skips, max_skips],
inclusive. See https://arxiv.org/abs/1301.3781 for more details about
skip-gram.
For example, given input_tensor = ["the", "quick", "brown", "fox",
"jumps"], min_skips = 1, max_skips = 2, emit_self_as_target = False,
the output (tokens, labels) pairs for the token "quick" will be randomly
selected from either (tokens=["quick", "quick"], labels=["the", "brown"])
for 1 skip, or (tokens=["quick", "quick", "quick"],
labels=["the", "brown", "fox"]) for 2 skips.
If emit_self_as_target = True, each token will also be emitted as a label
for itself. From the previous example, the output will be either
(tokens=["quick", "quick", "quick"], labels=["the", "quick", "brown"])
for 1 skip, or (tokens=["quick", "quick", "quick", "quick"],
labels=["the", "quick", "brown", "fox"]) for 2 skips.
The same process is repeated for each element of input_tensor and
concatenated together into the two output rank-1 Tensors (one for all the
tokens, another for all the labels).
If vocab_freq_table is specified, tokens in input_tensor that are not
present in the vocabulary are discarded. Tokens whose frequency counts are
below vocab_min_count are also discarded. Tokens whose frequency
proportions in the corpus exceed vocab_subsampling may be randomly
down-sampled. See Eq. 5 in http://arxiv.org/abs/1310.4546 for more details
about subsampling.
| Returns | |
|---|---|
| A tuplecontaining (token, label)Tensors. Each outputTensoris of
rank-1 and has the same type asinput_tensor. | 
| Raises | |
|---|---|
| ValueError | If vocab_freq_tableis not provided, butvocab_min_count,vocab_subsampling, orcorpus_sizeis specified.
Ifvocab_subsamplingandcorpus_sizeare not both present or
both absent. |