Have a question? Connect with the community at the TensorFlow Forum Visit Forum

tflite_model_maker.audio_classifier.DataLoader

DataLoader for audio tasks.

Used in the notebooks

Used in the tutorials

dataset A tf.data.Dataset object that contains a potentially large set of elements, where each element is a pair of (input_data, target). The input_data means the raw input data, like an image, a text etc., while the target means some ground truth of the raw input data, such as the classification label of the image etc.
size The size of the dataset. tf.data.Dataset donesn't support a function to get the length directly since it's lazy-loaded and may be infinite.

num_classes

Methods

from_esc50

Load ESC50 style audio samples.

ESC50 file structure is expalined in https://github.com/karolpiczak/ESC-50 Audio files should be put in ${data_path}/audio Metadata file should be put in ${data_path}/meta/esc50.csv

Note that instead of relying on the target field in the CSV, a new index_to_label mapping is created based on the alphabet order of the available categories.

Args
spec An instance of audio_spec.YAMNet
data_path A string, location of the ESC50 dataset. It should contain at
folds A integer list of selected folds. If empty, all folds will be selected.
categories A string list of selected categories. If empty, all categories will be selected.
shuffle boolean, if True, random shuffle data.
cache str or boolean. When set to True, intermediate results will be cached in ram. When set to a file path in string, intermediate results will be cached in this file. Please note that, once file based cache is created, changes to the input data will have no effects until the cache file is removed or the filename is changed. More details can be found at https://www.tensorflow.org/api_docs/python/tf/data/Dataset#cache

Returns
An instance of AudioDataLoader containing audio samples and labels.

from_folder

Load audio files from a data_path.

  • The root data_path folder contains a number of folders. The name for each folder is the name of the audio class.

  • Within each folder, there are a number of .wav files. Each .wav file corresponds to an example. Each .wav file is mono (single-channel) and has the typical 16 bit pulse-code modulation (PCM) encoding.

  • .wav files will be resampled to spec.target_sample_rate then fed into spec.preprocess_ds for split and other operations. Normally long wav files will be framed into multiple clips. And wav files shorter than a certain threshold will be ignored.

Args
spec instance of audio_spec.BaseSpec.
data_path string, location to the audio files.
categories A string list of selected categories. If empty, all categories will be selected.
shuffle boolean, if True, random shuffle data.
cache str or boolean. When set to True, intermediate results will be cached in ram. When set to a file path in string, intermediate results will be cached in this file. Please note that, once file based cache is created, changes to the input data will have no effects until the cache file is removed or the filename is changed. More details can be found at https://www.tensorflow.org/api_docs/python/tf/data/Dataset#cache

Returns
AudioDataLoader containing audio spectrogram (or any data type generated by spec.preprocess_ds) and labels.

gen_dataset

Generate a shared and batched tf.data.Dataset for training/evaluation.

Args
batch_size A integer, the returned dataset will be batched by this size.
is_training A boolean, when True, the returned dataset will be optionally shuffled. Data augmentation, if exists, will also be applied to the returned dataset.
shuffle A boolean, when True, the returned dataset will be shuffled to create randomness during model training. Only applies when is_training is set to True.
input_pipeline_context A InputContext instance, used to shared dataset among multiple workers when distribution strategy is used.
preprocess Not in use.
drop_remainder boolean, whether the finaly batch drops remainder.

Returns
A TF dataset ready to be consumed by Keras model.

split

Splits dataset into two sub-datasets with the given fraction.

Primarily used for splitting the data set into training and testing sets.

Args
fraction float, demonstrates the fraction of the first returned subdataset in the original data.

Returns
The splitted two sub datasets.

__len__

Returns the number of audio files in the DataLoader.

Note that one audio file could be framed (mostly via a sliding window of fixed size) into None or multiple audio clips during training and evaluation.