Attention: TensorFlow Lite is now part of Google AI Edge. The latest documentation is now at ai.google.dev/edge/lite. Learn more

tflite_model_maker.audio_classifier.DataLoader

View source on GitHub

DataLoader for audio tasks.

tflite_model_maker.audio_classifier.DataLoader(
    dataset, size, index_to_label, spec, cache=False
)

Used in the notebooks

Used in the tutorials
Retrain a speech recognition model with TensorFlow Lite Model Maker Transfer Learning for the Audio Domain with TensorFlow Lite Model Maker

Args
`dataset`	A tf.data.Dataset object that contains a potentially large set of elements, where each element is a pair of (input_data, target). The `input_data` means the raw input data, like an image, a text etc., while the `target` means some ground truth of the raw input data, such as the classification label of the image etc.
`size`	The size of the dataset. tf.data.Dataset donesn't support a function to get the length directly since it's lazy-loaded and may be infinite.

Attributes
`num_classes`
`size`	Returns the size of the dataset. Note that this function may return None becuase the exact size of the dataset isn't a necessary parameter to create an instance of this class, and tf.data.Dataset donesn't support a function to get the length directly since it's lazy-loaded and may be infinite. In most cases, however, when an instance of this class is created by helper functions like 'from_folder', the size of the dataset will be preprocessed, and this function can return an int representing the size of the dataset.

Attributes

num_classes

size

Returns the size of the dataset.

Note that this function may return None becuase the exact size of the dataset isn't a necessary parameter to create an instance of this class, and tf.data.Dataset donesn't support a function to get the length directly since it's lazy-loaded and may be infinite. In most cases, however, when an instance of this class is created by helper functions like 'from_folder', the size of the dataset will be preprocessed, and this function can return an int representing the size of the dataset.

Methods

`from_esc50`

View source

@classmethod
from_esc50(
    spec, data_path, folds=None, categories=None, shuffle=True, cache=False
)

Load ESC50 style audio samples.

ESC50 file structure is expalined in https://github.com/karolpiczak/ESC-50 Audio files should be put in ${data_path}/audio Metadata file should be put in ${data_path}/meta/esc50.csv

Note that instead of relying on the target field in the CSV, a new index_to_label mapping is created based on the alphabet order of the available categories.

Args
`spec`	An instance of audio_spec.YAMNet
`data_path`	A string, location of the ESC50 dataset. It should contain at
`folds`	A integer list of selected folds. If empty, all folds will be selected.
`categories`	A string list of selected categories. If empty, all categories will be selected.
`shuffle`	boolean, if True, random shuffle data.
`cache`	str or boolean. When set to True, intermediate results will be cached in ram. When set to a file path in string, intermediate results will be cached in this file. Please note that, once file based cache is created, changes to the input data will have no effects until the cache file is removed or the filename is changed. More details can be found at https://www.tensorflow.org/api_docs/python/tf/data/Dataset#cache

Returns
An instance of AudioDataLoader containing audio samples and labels.

`from_folder`

View source

@classmethod
from_folder(
    spec, data_path, categories=None, shuffle=True, cache=False
)

Load audio files from a data_path.

The root data_path folder contains a number of folders. The name for each folder is the name of the audio class.
Within each folder, there are a number of .wav files. Each .wav file corresponds to an example. Each .wav file is mono (single-channel) and has the typical 16 bit pulse-code modulation (PCM) encoding.
.wav files will be resampled to spec.target_sample_rate then fed into spec.preprocess_ds for split and other operations. Normally long wav files will be framed into multiple clips. And wav files shorter than a certain threshold will be ignored.

Args
`spec`	instance of `audio_spec.BaseSpec`.
`data_path`	string, location to the audio files.
`categories`	A string list of selected categories. If empty, all categories will be selected.
`shuffle`	boolean, if True, random shuffle data.
`cache`	str or boolean. When set to True, intermediate results will be cached in ram. When set to a file path in string, intermediate results will be cached in this file. Please note that, once file based cache is created, changes to the input data will have no effects until the cache file is removed or the filename is changed. More details can be found at https://www.tensorflow.org/api_docs/python/tf/data/Dataset#cache

Returns
`AudioDataLoader` containing audio spectrogram (or any data type generated by `spec.preprocess_ds`) and labels.

`gen_dataset`

View source

gen_dataset(
    batch_size=1,
    is_training=False,
    shuffle=False,
    input_pipeline_context=None,
    preprocess=None,
    drop_remainder=False
)

Generate a shared and batched tf.data.Dataset for training/evaluation.

Args
`batch_size`	A integer, the returned dataset will be batched by this size.
`is_training`	A boolean, when True, the returned dataset will be optionally shuffled. Data augmentation, if exists, will also be applied to the returned dataset.
`shuffle`	A boolean, when True, the returned dataset will be shuffled to create randomness during model training. Only applies when `is_training` is set to True.
`input_pipeline_context`	A InputContext instance, used to shared dataset among multiple workers when distribution strategy is used.
`preprocess`	Not in use.
`drop_remainder`	boolean, whether the finaly batch drops remainder.

Returns
A TF dataset ready to be consumed by Keras model.

`split`

View source

split(
    fraction
)

Splits dataset into two sub-datasets with the given fraction.

Primarily used for splitting the data set into training and testing sets.

Args
`fraction`	float, demonstrates the fraction of the first returned subdataset in the original data.

Returns
The splitted two sub datasets.

`len`

View source

__len__()

Returns the number of audio files in the DataLoader.

Note that one audio file could be framed (mostly via a sliding window of fixed size) into None or multiple audio clips during training and evaluation.

tflite_model_maker.audio_classifier.DataLoader

Used in the notebooks

Args

Attributes

Methods

from_esc50

from_folder

gen_dataset

split

__len__

`from_esc50`

`from_folder`

`gen_dataset`

`split`

`len`