Transforms a spectrogram into a form that's useful for speech recognition.
tf.raw_ops.Mfcc(
spectrogram, sample_rate, upper_frequency_limit=4000, lower_frequency_limit=20,
filterbank_channel_count=40, dct_coefficient_count=13, name=None
)
Mel Frequency Cepstral Coefficients are a way of representing audio data that's been effective as an input feature for machine learning. They are created by taking the spectrum of a spectrogram (a 'cepstrum'), and discarding some of the higher frequencies that are less significant to the human ear. They have a long history in the speech recognition world, and https://en.wikipedia.org/wiki/Mel-frequency_cepstrum is a good resource to learn more.
Args | |
|---|---|
spectrogram
|
A Tensor of type float32.
Typically produced by the Spectrogram op, with magnitude_squared
set to true.
|
sample_rate
|
A Tensor of type int32.
How many samples per second the source audio used.
|
upper_frequency_limit
|
An optional float. Defaults to 4000.
The highest frequency to use when calculating the
ceptstrum.
|
lower_frequency_limit
|
An optional float. Defaults to 20.
The lowest frequency to use when calculating the
ceptstrum.
|
filterbank_channel_count
|
An optional int. Defaults to 40.
Resolution of the Mel bank used internally.
|
dct_coefficient_count
|
An optional int. Defaults to 13.
How many output channels to produce per time slice.
|
name
|
A name for the operation (optional). |
Returns | |
|---|---|
A Tensor of type float32.
|