tf.signal.mfccs_from_log_mel_spectrograms
Stay organized with collections
Save and categorize content based on your preferences.
Computes MFCCs of log_mel_spectrograms
.
tf.signal.mfccs_from_log_mel_spectrograms(
log_mel_spectrograms, name=None
)
Implemented with GPU-compatible ops and supports gradients.
Mel-Frequency Cepstral Coefficient (MFCC) calculation consists of
taking the DCT-II of a log-magnitude mel-scale spectrogram. HTK's MFCCs
use a particular scaling of the DCT-II which is almost orthogonal
normalization. We follow this convention.
All num_mel_bins
MFCCs are returned and it is up to the caller to select
a subset of the MFCCs based on their application. For example, it is typical
to only use the first few for speech recognition, as this results in
an approximately pitch-invariant representation of the signal.
For example:
batch_size, num_samples, sample_rate = 32, 32000, 16000.0
# A Tensor of [batch_size, num_samples] mono PCM samples in the range [-1, 1].
pcm = tf.random.normal([batch_size, num_samples], dtype=tf.float32)
# A 1024-point STFT with frames of 64 ms and 75% overlap.
stfts = tf.signal.stft(pcm, frame_length=1024, frame_step=256,
fft_length=1024)
spectrograms = tf.abs(stfts)
# Warp the linear scale spectrograms into the mel-scale.
num_spectrogram_bins = stfts.shape[-1].value
lower_edge_hertz, upper_edge_hertz, num_mel_bins = 80.0, 7600.0, 80
linear_to_mel_weight_matrix = tf.signal.linear_to_mel_weight_matrix(
num_mel_bins, num_spectrogram_bins, sample_rate, lower_edge_hertz,
upper_edge_hertz)
mel_spectrograms = tf.tensordot(
spectrograms, linear_to_mel_weight_matrix, 1)
mel_spectrograms.set_shape(spectrograms.shape[:-1].concatenate(
linear_to_mel_weight_matrix.shape[-1:]))
# Compute a stabilized log to get log-magnitude mel-scale spectrograms.
log_mel_spectrograms = tf.math.log(mel_spectrograms + 1e-6)
# Compute MFCCs from log_mel_spectrograms and take the first 13.
mfccs = tf.signal.mfccs_from_log_mel_spectrograms(
log_mel_spectrograms)[..., :13]
Args |
log_mel_spectrograms
|
A [..., num_mel_bins] float32 /float64 Tensor
of log-magnitude mel-scale spectrograms.
|
name
|
An optional name for the operation.
|
Returns |
A [..., num_mel_bins] float32 /float64 Tensor of the MFCCs of
log_mel_spectrograms .
|
Raises |
ValueError
|
If num_mel_bins is not positive.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates. Some content is licensed under the numpy license.
Last updated 2021-05-14 UTC.
[null,null,["Last updated 2021-05-14 UTC."],[],[],null,["# tf.signal.mfccs_from_log_mel_spectrograms\n\n\u003cbr /\u003e\n\n|---------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------|\n| [TensorFlow 1 version](/versions/r1.15/api_docs/python/tf/signal/mfccs_from_log_mel_spectrograms) | [View source on GitHub](https://github.com/tensorflow/tensorflow/blob/v2.5.0/tensorflow/python/ops/signal/mfcc_ops.py#L29-L111) |\n\nComputes [MFCCs](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum) of `log_mel_spectrograms`.\n\n#### View aliases\n\n\n**Compat aliases for migration**\n\nSee\n[Migration guide](https://www.tensorflow.org/guide/migrate) for\nmore details.\n\n[`tf.compat.v1.signal.mfccs_from_log_mel_spectrograms`](https://www.tensorflow.org/api_docs/python/tf/signal/mfccs_from_log_mel_spectrograms)\n\n\u003cbr /\u003e\n\n tf.signal.mfccs_from_log_mel_spectrograms(\n log_mel_spectrograms, name=None\n )\n\nImplemented with GPU-compatible ops and supports gradients.\n\n[Mel-Frequency Cepstral Coefficient (MFCC)](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum) calculation consists of\ntaking the DCT-II of a log-magnitude mel-scale spectrogram. [HTK](https://en.wikipedia.org/wiki/HTK_(software))'s MFCCs\nuse a particular scaling of the DCT-II which is almost orthogonal\nnormalization. We follow this convention.\n\nAll `num_mel_bins` MFCCs are returned and it is up to the caller to select\na subset of the MFCCs based on their application. For example, it is typical\nto only use the first few for speech recognition, as this results in\nan approximately pitch-invariant representation of the signal.\n\n#### For example:\n\n batch_size, num_samples, sample_rate = 32, 32000, 16000.0\n # A Tensor of [batch_size, num_samples] mono PCM samples in the range [-1, 1].\n pcm = tf.random.normal([batch_size, num_samples], dtype=tf.float32)\n\n # A 1024-point STFT with frames of 64 ms and 75% overlap.\n stfts = tf.signal.stft(pcm, frame_length=1024, frame_step=256,\n fft_length=1024)\n spectrograms = tf.abs(stfts)\n\n # Warp the linear scale spectrograms into the mel-scale.\n num_spectrogram_bins = stfts.shape[-1].value\n lower_edge_hertz, upper_edge_hertz, num_mel_bins = 80.0, 7600.0, 80\n linear_to_mel_weight_matrix = tf.signal.linear_to_mel_weight_matrix(\n num_mel_bins, num_spectrogram_bins, sample_rate, lower_edge_hertz,\n upper_edge_hertz)\n mel_spectrograms = tf.tensordot(\n spectrograms, linear_to_mel_weight_matrix, 1)\n mel_spectrograms.set_shape(spectrograms.shape[:-1].concatenate(\n linear_to_mel_weight_matrix.shape[-1:]))\n\n # Compute a stabilized log to get log-magnitude mel-scale spectrograms.\n log_mel_spectrograms = tf.math.log(mel_spectrograms + 1e-6)\n\n # Compute MFCCs from log_mel_spectrograms and take the first 13.\n mfccs = tf.signal.mfccs_from_log_mel_spectrograms(\n log_mel_spectrograms)[..., :13]\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|------------------------|-----------------------------------------------------------------------------------------------|\n| `log_mel_spectrograms` | A `[..., num_mel_bins]` `float32`/`float64` `Tensor` of log-magnitude mel-scale spectrograms. |\n| `name` | An optional name for the operation. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| A `[..., num_mel_bins]` `float32`/`float64` `Tensor` of the MFCCs of `log_mel_spectrograms`. ||\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ------ ||\n|--------------|------------------------------------|\n| `ValueError` | If `num_mel_bins` is not positive. |\n\n\u003cbr /\u003e"]]