tf.keras.preprocessing.sequence.make_sampling_table

TensorFlow 1 version

View source on GitHub

Generates a word rank-based probabilistic sampling table.

View aliases

Compat aliases for migration

See Migration guide for more details.

tf.compat.v1.keras.preprocessing.sequence.make_sampling_table

tf.keras.preprocessing.sequence.make_sampling_table(
    size, sampling_factor=1e-05
)

Used for generating the sampling_table argument for skipgrams. sampling_table[i] is the probability of sampling the word i-th most common word in a dataset (more common words should be sampled less frequently, for balance).

The sampling probabilities are generated according to the sampling distribution used in word2vec:

p(word) = (min(1, sqrt(word_frequency / sampling_factor) /
    (word_frequency / sampling_factor)))

We assume that the word frequencies follow Zipf's law (s=1) to derive a numerical approximation of frequency(rank):

frequency(rank) ~ 1/(rank * (log(rank) + gamma) + 1/2 - 1/(12*rank)) where gamma is the Euler-Mascheroni constant.

Arguments
`size`	Int, number of possible words to sample.
`sampling_factor`	The sampling factor in the word2vec formula.

Returns
A 1D Numpy array of length `size` where the ith entry is the probability that a word of rank i should be sampled.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates. Some content is licensed under the numpy license.

Last updated 2021-02-18 UTC.

English
中文 – 简体

tf.keras.preprocessing.sequence.make_sampling_table Stay organized with collections Save and categorize content based on your preferences.

View aliases

Arguments

Returns

tf.keras.preprocessing.sequence.make_sampling_table