Aprenda o que há de mais recente em aprendizado de máquina, IA generativa e muito mais no WiML Symposium 2023 Registre-se

Esta página foi traduzida pela API Cloud Translation.

AudioSpectrogram

public final class AudioSpectrogram

Produces a visualization of audio data over time.

Spectrograms are a standard way of representing audio information as a series of slices of frequency information, one slice for each window of time. By joining these together into a sequence, they form a distinctive fingerprint of the sound over time.

This op expects to receive audio data as an input, stored as floats in the range -1 to 1, together with a window width in samples, and a stride specifying how far to move the window between slices. From this it generates a three dimensional output. The first dimension is for the channels in the input, so a stereo audio input would have two here for example. The second dimension is time, with successive frequency slices. The third dimension has an amplitude value for each frequency during that time slice.

This means the layout when converted and saved as an image is rotated 90 degrees clockwise from a typical spectrogram. Time is descending down the Y axis, and the frequency decreases from left to right.

Each value in the result represents the square root of the sum of the real and imaginary parts of an FFT on the current window of samples. In this way, the lowest dimension represents the power of each frequency in the current window, and adjacent windows are concatenated in the next dimension.

To get a more intuitive and visual look at what this operation does, you can run tensorflow/examples/wav_to_spectrogram to read in an audio file and save out the resulting spectrogram as a PNG image.

Nested Classes

class AudioSpectrogram.Options Optional attributes for AudioSpectrogram

Constants

String OP_NAME The name of this op, as known by TensorFlow core engine

Public Methods

Output < TFloat32 >	asOutput () Returns the symbolic handle of the tensor.
static AudioSpectrogram	create ( Scope scope, Operand < TFloat32 > input, Long windowSize, Long stride, Options... options) Factory method to create a class wrapping a new AudioSpectrogram operation.
static AudioSpectrogram.Options	magnitudeSquared (Boolean magnitudeSquared)
Output < TFloat32 >	spectrogram () 3D representation of the audio frequencies as an image.

Inherited Methods

From class org.tensorflow.op.RawOp

final boolean	equals (Object obj)
final int	hashCode ()
Operation	op () Return this unit of computation as a single `Operation` .
final String	toString ()

From class java.lang.Object

boolean	equals (Object arg0)
final Class<?>	getClass ()
int	hashCode ()
final void	notify ()
final void	notifyAll ()
String	toString ()
final void	wait (long arg0, int arg1)
final void	wait (long arg0)
final void	wait ()

From interface org.tensorflow.op.Op

abstract ExecutionEnvironment	env () Return the execution environment this op was created in.
abstract Operation	op () Return this unit of computation as a single `Operation` .

From interface org.tensorflow.Operand

abstract Output < TFloat32 >	asOutput () Returns the symbolic handle of the tensor.
abstract TFloat32	asTensor () Returns the tensor at this operand.
abstract Shape	shape () Returns the (possibly partially known) shape of the tensor referred to by the `Output` of this operand.
abstract Class< TFloat32 >	type () Returns the tensor type of this operand

From interface org.tensorflow.ndarray.Shaped

abstract int	rank ()
abstract Shape	shape ()
abstract long	size () Computes and returns the total size of this container, in number of values.

Constants

public static final String OP_NAME

The name of this op, as known by TensorFlow core engine

Constant Value: "AudioSpectrogram"

Public Methods

public Output < TFloat32 > asOutput ()

Returns the symbolic handle of the tensor.

Inputs to TensorFlow operations are outputs of another TensorFlow operation. This method is used to obtain a symbolic handle that represents the computation of the input.

public static AudioSpectrogram create ( Scope scope, Operand < TFloat32 > input, Long windowSize, Long stride, Options... options)

Factory method to create a class wrapping a new AudioSpectrogram operation.

Parameters

scope	current scope
input	Float representation of audio data.
windowSize	How wide the input window is in samples. For the highest efficiency this should be a power of two, but other values are accepted.
stride	How widely apart the center of adjacent sample windows should be.
options	carries optional attributes values

Returns

a new instance of AudioSpectrogram

public static AudioSpectrogram.Options magnitudeSquared (Boolean magnitudeSquared)

Parameters

magnitudeSquared	Whether to return the squared magnitude or just the magnitude. Using squared magnitude can avoid extra calculations.

public Output < TFloat32 > spectrogram ()

3D representation of the audio frequencies as an image.