tf.signal.linear_to_mel_weight_matrix

Returns a matrix to warp linear scale spectrograms to the mel scale.

Returns a weight matrix that can be used to re-weight a Tensor containing num_spectrogram_bins linearly sampled frequency information from [0, sample_rate / 2] into num_mel_bins frequency information from [lower_edge_hertz, upper_edge_hertz] on the mel scale.

This function follows the Hidden Markov Model Toolkit (HTK) convention, defining the mel scale in terms of a frequency in hertz according to the following formula:

$$\textrm{mel}(f) = 2595 * \textrm{log}_{10}(1 + \frac{f}{700})$$

In the returned matrix, all the triangles (filterbanks) have a peak value of 1.0.

For example, the returned matrix A can be used to right-multiply a spectrogram S of shape [frames, num_spectrogram_bins] of linear scale spectrum values (e.g. STFT magnitudes) to generate a "mel spectrogram" M of shape [frames, num_mel_bins].

# `S` has shape [frames, num_spectrogram_bins]
# `M` has shape [frames, num_mel_bins]
M = tf.matmul(S, A)

The matrix can be used with tf.tensordot to convert an arbitrary rank Tensor of linear-scale spectral bins into the mel scale.

# S has shape [..., num_spectrogram_bins].
# M has shape [..., num_mel_bins].
M = tf.tensordot(S, A, 1)

num_mel_bins Python int. How many bands in the resulting mel spectrum.
num_spectrogram_bins An integer Tensor. How many bins there are in the source spectrogram data, which is understood to be fft_size // 2 + 1, i.e. the spectrogram only contains the nonredundant FFT bins.
sample_rate An integer or floa