View source on GitHub |
DataLoader for audio tasks.
tflite_model_maker.audio_classifier.DataLoader(
dataset, size, index_to_label, spec, cache=False
)
Used in the notebooks
Used in the tutorials |
---|
Methods
from_esc50
@classmethod
from_esc50( spec, data_path, folds=None, categories=None, shuffle=True, cache=False )
Load ESC50 style audio samples.
ESC50 file structure is expalined in https://github.com/karolpiczak/ESC-50
Audio files should be put in ${data_path}/audio
Metadata file should be put in ${data_path}/meta/esc50.csv
Note that instead of relying on the target
field in the CSV, a new
index_to_label
mapping is created based on the alphabet order of the
available categories.
Args | |
---|---|
spec
|
An instance of audio_spec.YAMNet |
data_path
|
A string, location of the ESC50 dataset. It should contain at |
folds
|
A integer list of selected folds. If empty, all folds will be selected. |
categories
|
A string list of selected categories. If empty, all categories will be selected. |
shuffle
|
boolean, if True, random shuffle data. |
cache
|
str or boolean. When set to True, intermediate results will be cached in ram. When set to a file path in string, intermediate results will be cached in this file. Please note that, once file based cache is created, changes to the input data will have no effects until the cache file is removed or the filename is changed. More details can be found at https://www.tensorflow.org/api_docs/python/tf/data/Dataset#cache |
Returns | |
---|---|
An instance of AudioDataLoader containing audio samples and labels. |
from_folder
@classmethod
from_folder( spec, data_path, categories=None, shuffle=True, cache=False )
Load audio files from a data_path.
The root
data_path
folder contains a number of folders. The name for each folder is the name of the audio class.Within each folder, there are a number of .wav files. Each .wav file corresponds to an example. Each .wav file is mono (single-channel) and has the typical 16 bit pulse-code modulation (PCM) encoding.
.wav files will be resampled to
spec.target_sample_rate
then fed intospec.preprocess_ds
for split and other operations. Normally long wav files will be framed into multiple clips. And wav files shorter than a certain threshold will be ignored.
Args | |
---|---|
spec
|
instance of audio_spec.BaseSpec .
|
data_path
|
string, location to the audio files. |
categories
|
A string list of selected categories. If empty, all categories will be selected. |
shuffle
|
boolean, if True, random shuffle data. |
cache
|
str or boolean. When set to True, intermediate results will be cached in ram. When set to a file path in string, intermediate results will be cached in this file. Please note that, once file based cache is created, changes to the input data will have no effects until the cache file is removed or the filename is changed. More details can be found at https://www.tensorflow.org/api_docs/python/tf/data/Dataset#cache |
Returns | |
---|---|
AudioDataLoader containing audio spectrogram (or any data type generated
by spec.preprocess_ds ) and labels.
|
gen_dataset
gen_dataset(
batch_size=1,
is_training=False,
shuffle=False,
input_pipeline_context=None,
preprocess=None,
drop_remainder=False
)
Generate a shared and batched tf.data.Dataset for training/evaluation.
Args | |
---|---|
batch_size
|
A integer, the returned dataset will be batched by this size. |
is_training
|
A boolean, when True, the returned dataset will be optionally shuffled. Data augmentation, if exists, will also be applied to the returned dataset. |
shuffle
|
A boolean, when True, the returned dataset will be shuffled to
create randomness during model training. Only applies when is_training
is set to True.
|
input_pipeline_context
|
A InputContext instance, used to shared dataset among multiple workers when distribution strategy is used. |
preprocess
|
Not in use. |
drop_remainder
|
boolean, whether the finaly batch drops remainder. |
Returns | |
---|---|
A TF dataset ready to be consumed by Keras model. |
split
split(
fraction
)
Splits dataset into two sub-datasets with the given fraction.
Primarily used for splitting the data set into training and testing sets.
Args | |
---|---|
fraction
|
float, demonstrates the fraction of the first returned subdataset in the original data. |
Returns | |
---|---|
The splitted two sub datasets. |
__len__
__len__()
Returns the number of audio files in the DataLoader.
Note that one audio file could be framed (mostly via a sliding window of fixed size) into None or multiple audio clips during training and evaluation.