![]() |
![]() |
![]() |
![]() |
![]() |
In this colab notebook, you'll learn how to use the TensorFlow Lite Model Maker to train a custom audio classification model.
The Model Maker library uses transfer learning to simplify the process of training a TensorFlow Lite model using a custom dataset. Retraining a TensorFlow Lite model with your own custom dataset reduces the amount of training data and time required.
It is part of the Codelab to Customize an Audio model and deploy on Android.
You'll use a custom birds dataset and export a TFLite model that can be used on a phone, a TensorFlow.JS model that can be used for inference in the browser and also a SavedModel version that you can use for serving.
Intalling dependencies
sudo apt -y install libportaudio2
pip install tflite-model-maker
Import TensorFlow, Model Maker and other libraries
Among the dependencies that are needed, you'll use TensorFlow and Model Maker. Aside those, the others are for audio manipulation, playing and visualizations.
import tensorflow as tf
import tflite_model_maker as mm
from tflite_model_maker import audio_classifier
import os
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import itertools
import glob
import random
from IPython.display import Audio, Image
from scipy.io import wavfile
print(f"TensorFlow Version: {tf.__version__}")
print(f"Model Maker Version: {mm.__version__}")
TensorFlow Version: 2.8.0 Model Maker Version: 0.4.0
The Birds dataset
The Birds dataset is an education collection of 5 types of birds songs:
- White-breasted Wood-Wren
- House Sparrow
- Red Crossbill
- Chestnut-crowned Antpitta
- Azara's Spinetail
The original audio came from Xeno-canto which is a website dedicated to sharing bird sounds from all over the world.
Let's start by downloading the data.
birds_dataset_folder = tf.keras.utils.get_file('birds_dataset.zip',
'https://storage.googleapis.com/laurencemoroney-blog.appspot.com/birds_dataset.zip',
cache_dir='./',
cache_subdir='dataset',
extract=True)
Downloading data from https://storage.googleapis.com/laurencemoroney-blog.appspot.com/birds_dataset.zip 343687168/343680986 [==============================] - 2s 0us/step 343695360/343680986 [==============================] - 2s 0us/step
Explore the data
The audios are already split in train and test folders. Inside each split folder, there's one folder for each bird, using their bird_code
as name.
The audios are all mono and with 16kHz sample rate.
For more information about each file, you can read the metadata.csv
file. It contains all the files authors, lincenses and some more information. You won't need to read it yourself on this tutorial.
# @title [Run this] Util functions and data structures.
data_dir = './dataset/small_birds_dataset'
bird_code_to_name = {
'wbwwre1': 'White-breasted Wood-Wren',
'houspa': 'House Sparrow',
'redcro': 'Red Crossbill',
'chcant2': 'Chestnut-crowned Antpitta',
'azaspi1': "Azara's Spinetail",
}
birds_images = {
'wbwwre1': 'https://upload.wikimedia.org/wikipedia/commons/thumb/2/22/Henicorhina_leucosticta_%28Cucarachero_pechiblanco%29_-_Juvenil_%2814037225664%29.jpg/640px-Henicorhina_leucosticta_%28Cucarachero_pechiblanco%29_-_Juvenil_%2814037225664%29.jpg', # Alejandro Bayer Tamayo from Armenia, Colombia
'houspa': 'https://upload.wikimedia.org/wikipedia/commons/thumb/5/52/House_Sparrow%2C_England_-_May_09.jpg/571px-House_Sparrow%2C_England_-_May_09.jpg', # Diliff
'redcro': 'https://upload.wikimedia.org/wikipedia/commons/thumb/4/49/Red_Crossbills_%28Male%29.jpg/640px-Red_Crossbills_%28Male%29.jpg', # Elaine R. Wilson, www.naturespicsonline.com
'chcant2': 'https://upload.wikimedia.org/wikipedia/commons/thumb/6/67/Chestnut-crowned_antpitta_%2846933264335%29.jpg/640px-Chestnut-crowned_antpitta_%2846933264335%29.jpg', # Mike's Birds from Riverside, CA, US
'azaspi1': 'https://upload.wikimedia.org/wikipedia/commons/thumb/b/b2/Synallaxis_azarae_76608368.jpg/640px-Synallaxis_azarae_76608368.jpg', # https://www.inaturalist.org/photos/76608368
}
test_files = os.path.abspath(os.path.join(data_dir, 'test/*/*.wav'))
def get_random_audio_file():
test_list = glob.glob(test_files)
random_audio_path = random.choice(test_list)
return random_audio_path
def show_bird_data(audio_path):
sample_rate, audio_data = wavfile.read(audio_path, 'rb')
bird_code = audio_path.split('/')[-2]
print(f'Bird name: {bird_code_to_name[bird_code]}')
print(f'Bird code: {bird_code}')
display(Image(birds_images[bird_code]))
plttitle = f'{bird_code_to_name[bird_code]} ({bird_code})'
plt.title(plttitle)
plt.plot(audio_data)
display(Audio(audio_data, rate=sample_rate))
print('functions and data structures created')
functions and data structures created
Playing some audio
To have a better understanding about the data, lets listen to a random audio files from the test split.
random_audio = get_random_audio_file()
show_bird_data(random_audio)
Bird name: Azara's Spinetail Bird code: azaspi1
Training the Model
When using Model Maker for audio, you have to start with a model spec. This is the base model that your new model will extract information to learn about the new classes. It also affects how the dataset will be transformed to respect the models spec parameters like: sample rate, number of channels.
YAMNet is an audio event classifier trained on the AudioSet dataset to predict audio events from the AudioSet ontology.
It's input is expected to be at 16kHz and with 1 channel.
You don't need to do any resampling yourself. Model Maker takes care of that for you.
frame_length
is to decide how long each traininng sample is. in this caase EXPECTED_WAVEFORM_LENGTH * 3sframe_steps
is to decide how far appart are the training samples. In this case, the ith sample will start at EXPECTED_WAVEFORM_LENGTH * 6s after the (i-1)th sample.
The reason to set these values is to work around some limitation in real world dataset.
For example, in the bird dataset, birds don't sing all the time. They sing, rest and sing again, with noises in between. Having a long frame would help capture the singing, but setting it too long will reduce the number of samples for training.
spec = audio_classifier.YamNetSpec(
keep_yamnet_and_custom_heads=True,
frame_step=3 * audio_classifier.YamNetSpec.EXPECTED_WAVEFORM_LENGTH,
frame_length=6 * audio_classifier.YamNetSpec.EXPECTED_WAVEFORM_LENGTH)
INFO:tensorflow:Checkpoints are stored in /tmpfs/tmp/tmp07sxctj6
Loading the data
Model Maker has the API to load the data from a folder and have it in the expected format for the model spec.
The train and test split are based on the folders. The validation dataset will be created as 20% of the train split.
train_data = audio_classifier.DataLoader.from_folder(
spec, os.path.join(data_dir, 'train'), cache=True)
train_data, validation_data = train_data.split(0.8)
test_data = audio_classifier.DataLoader.from_folder(
spec, os.path.join(data_dir, 'test'), cache=True)
Training the model
the audio_classifier has the create
method that creates a model and already start training it.
You can customize many parameterss, for more information you can read more details in the documentation.
On this first try you'll use all the default configurations and train for 100 epochs.
batch_size = 128
epochs = 100
print('Training the model')
model = audio_classifier.create(
train_data,
spec,
validation_data,
batch_size=batch_size,
epochs=epochs)
Training the model Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= classification_head (Dense) (None, 5) 5125 ================================================================= Total params: 5,125 Trainable params: 5,125 Non-trainable params: 0 _________________________________________________________________ Epoch 1/100 23/23 [==============================] - 22s 797ms/step - loss: 1.5075 - acc: 0.3383 - val_loss: 1.5218 - val_acc: 0.3356 Epoch 2/100 23/23 [==============================] - 1s 22ms/step - loss: 1.2447 - acc: 0.5141 - val_loss: 1.3324 - val_acc: 0.4360 Epoch 3/100 23/23 [==============================] - 1s 22ms/step - loss: 1.0961 - acc: 0.6145 - val_loss: 1.2318 - val_acc: 0.5104 Epoch 4/100 23/23 [==============================] - 1s 22ms/step - loss: 0.9847 - acc: 0.6869 - val_loss: 1.1583 - val_acc: 0.5761 Epoch 5/100 23/23 [==============================] - 1s 21ms/step - loss: 0.9175 - acc: 0.7117 - val_loss: 1.1104 - val_acc: 0.6194 Epoch 6/100 23/23 [==============================] - 1s 21ms/step - loss: 0.8517 - acc: 0.7317 - val_loss: 1.0764 - val_acc: 0.6453 Epoch 7/100 23/23 [==============================] - 1s 21ms/step - loss: 0.8021 - acc: 0.7466 - val_loss: 1.0415 - val_acc: 0.6626 Epoch 8/100 23/23 [==============================] - 1s 21ms/step - loss: 0.7686 - acc: 0.7628 - val_loss: 1.0294 - val_acc: 0.6730 Epoch 9/100 23/23 [==============================] - 1s 22ms/step - loss: 0.7355 - acc: 0.7693 - val_loss: 1.0106 - val_acc: 0.6799 Epoch 10/100 23/23 [==============================] - 1s 21ms/step - loss: 0.6943 - acc: 0.7866 - val_loss: 0.9904 - val_acc: 0.6851 Epoch 11/100 23/23 [==============================] - 1s 23ms/step - loss: 0.6671 - acc: 0.7972 - val_loss: 0.9827 - val_acc: 0.6886 Epoch 12/100 23/23 [==============================] - 1s 21ms/step - loss: 0.6377 - acc: 0.8093 - val_loss: 0.9751 - val_acc: 0.6972 Epoch 13/100 23/23 [==============================] - 1s 21ms/step - loss: 0.6307 - acc: 0.8045 - val_loss: 0.9715 - val_acc: 0.6990 Epoch 14/100 23/23 [==============================] - 1s 23ms/step - loss: 0.6120 - acc: 0.8090 - val_loss: 0.9654 - val_acc: 0.7024 Epoch 15/100 23/23 [==============================] - 1s 21ms/step - loss: 0.5950 - acc: 0.8210 - val_loss: 0.9635 - val_acc: 0.7042 Epoch 16/100 23/23 [==============================] - 1s 21ms/step - loss: 0.5728 - acc: 0.8259 - val_loss: 0.9505 - val_acc: 0.7076 Epoch 17/100 23/23 [==============================] - 1s 22ms/step - loss: 0.5555 - acc: 0.8338 - val_loss: 0.9516 - val_acc: 0.7093 Epoch 18/100 23/23 [==============================] - 1s 23ms/step - loss: 0.5486 - acc: 0.8266 - val_loss: 0.9449 - val_acc: 0.7111 Epoch 19/100 23/23 [==============================] - 1s 22ms/step - loss: 0.5307 - acc: 0.8348 - val_loss: 0.9406 - val_acc: 0.7145 Epoch 20/100 23/23 [==============================] - 1s 23ms/step - loss: 0.5278 - acc: 0.8421 - val_loss: 0.9340 - val_acc: 0.7163 Epoch 21/100 23/23 [==============================] - 1s 22ms/step - loss: 0.5145 - acc: 0.8366 - val_loss: 0.9296 - val_acc: 0.7215 Epoch 22/100 23/23 [==============================] - 1s 21ms/step - loss: 0.5136 - acc: 0.8400 - val_loss: 0.9334 - val_acc: 0.7215 Epoch 23/100 23/23 [==============================] - 1s 22ms/step - loss: 0.4998 - acc: 0.8438 - val_loss: 0.9303 - val_acc: 0.7180 Epoch 24/100 23/23 [==============================] - 1s 22ms/step - loss: 0.4987 - acc: 0.8459 - val_loss: 0.9279 - val_acc: 0.7215 Epoch 25/100 23/23 [==============================] - 1s 23ms/step - loss: 0.4791 - acc: 0.8528 - val_loss: 0.9248 - val_acc: 0.7266 Epoch 26/100 23/23 [==============================] - 1s 23ms/step - loss: 0.4755 - acc: 0.8490 - val_loss: 0.9281 - val_acc: 0.7266 Epoch 27/100 23/23 [==============================] - 1s 22ms/step - loss: 0.4699 - acc: 0.8538 - val_loss: 0.9197 - val_acc: 0.7353 Epoch 28/100 23/23 [==============================] - 1s 21ms/step - loss: 0.4608 - acc: 0.8531 - val_loss: 0.9165 - val_acc: 0.7353 Epoch 29/100 23/23 [==============================] - 1s 22ms/step - loss: 0.4540 - acc: 0.8555 - val_loss: 0.9193 - val_acc: 0.7388 Epoch 30/100 23/23 [==============================] - 1s 21ms/step - loss: 0.4512 - acc: 0.8628 - val_loss: 0.9207 - val_acc: 0.7388 Epoch 31/100 23/23 [==============================] - 1s 21ms/step - loss: 0.4489 - acc: 0.8593 - val_loss: 0.9167 - val_acc: 0.7405 Epoch 32/100 23/23 [==============================] - 1s 21ms/step - loss: 0.4386 - acc: 0.8593 - val_loss: 0.9179 - val_acc: 0.7405 Epoch 33/100 23/23 [==============================] - 1s 23ms/step - loss: 0.4400 - acc: 0.8593 - val_loss: 0.9080 - val_acc: 0.7405 Epoch 34/100 23/23 [==============================] - 1s 23ms/step - loss: 0.4248 - acc: 0.8621 - val_loss: 0.9152 - val_acc: 0.7422 Epoch 35/100 23/23 [==============================] - 1s 22ms/step - loss: 0.4186 - acc: 0.8724 - val_loss: 0.9104 - val_acc: 0.7388 Epoch 36/100 23/23 [==============================] - 1s 22ms/step - loss: 0.4200 - acc: 0.8641 - val_loss: 0.9055 - val_acc: 0.7388 Epoch 37/100 23/23 [==============================] - 1s 21ms/step - loss: 0.4136 - acc: 0.8662 - val_loss: 0.9101 - val_acc: 0.7336 Epoch 38/100 23/23 [==============================] - 1s 23ms/step - loss: 0.4149 - acc: 0.8686 - val_loss: 0.9142 - val_acc: 0.7388 Epoch 39/100 23/23 [==============================] - 1s 21ms/step - loss: 0.4018 - acc: 0.8690 - val_loss: 0.9079 - val_acc: 0.7370 Epoch 40/100 23/23 [==============================] - 1s 22ms/step - loss: 0.4010 - acc: 0.8721 - val_loss: 0.9101 - val_acc: 0.7353 Epoch 41/100 23/23 [==============================] - 1s 23ms/step - loss: 0.4049 - acc: 0.8721 - val_loss: 0.9096 - val_acc: 0.7336 Epoch 42/100 23/23 [==============================] - 1s 21ms/step - loss: 0.4005 - acc: 0.8734 - val_loss: 0.9095 - val_acc: 0.7353 Epoch 43/100 23/23 [==============================] - 1s 22ms/step - loss: 0.3905 - acc: 0.8728 - val_loss: 0.9074 - val_acc: 0.7388 Epoch 44/100 23/23 [==============================] - 1s 22ms/step - loss: 0.3867 - acc: 0.8803 - val_loss: 0.9107 - val_acc: 0.7370 Epoch 45/100 23/23 [==============================] - 0s 20ms/step - loss: 0.3899 - acc: 0.8766 - val_loss: 0.9104 - val_acc: 0.7388 Epoch 46/100 23/23 [==============================] - 1s 21ms/step - loss: 0.3808 - acc: 0.8766 - val_loss: 0.9022 - val_acc: 0.7405 Epoch 47/100 23/23 [==============================] - 1s 20ms/step - loss: 0.3818 - acc: 0.8772 - val_loss: 0.9044 - val_acc: 0.7422 Epoch 48/100 23/23 [==============================] - 1s 21ms/step - loss: 0.3698 - acc: 0.8838 - val_loss: 0.8999 - val_acc: 0.7422 Epoch 49/100 23/23 [==============================] - 1s 21ms/step - loss: 0.3722 - acc: 0.8841 - val_loss: 0.9025 - val_acc: 0.7422 Epoch 50/100 23/23 [==============================] - 1s 21ms/step - loss: 0.3710 - acc: 0.8814 - val_loss: 0.9079 - val_acc: 0.7422 Epoch 51/100 23/23 [==============================] - 1s 21ms/step - loss: 0.3685 - acc: 0.8841 - val_loss: 0.9033 - val_acc: 0.7405 Epoch 52/100 23/23 [==============================] - 1s 21ms/step - loss: 0.3676 - acc: 0.8793 - val_loss: 0.8987 - val_acc: 0.7439 Epoch 53/100 23/23 [==============================] - 1s 22ms/step - loss: 0.3632 - acc: 0.8814 - val_loss: 0.8928 - val_acc: 0.7439 Epoch 54/100 23/23 [==============================] - 1s 23ms/step - loss: 0.3590 - acc: 0.8824 - val_loss: 0.9011 - val_acc: 0.7457 Epoch 55/100 23/23 [==============================] - 1s 21ms/step - loss: 0.3532 - acc: 0.8866 - val_loss: 0.9094 - val_acc: 0.7439 Epoch 56/100 23/23 [==============================] - 1s 21ms/step - loss: 0.3528 - acc: 0.8903 - val_loss: 0.8965 - val_acc: 0.7457 Epoch 57/100 23/23 [==============================] - 1s 22ms/step - loss: 0.3606 - acc: 0.8845 - val_loss: 0.9003 - val_acc: 0.7422 Epoch 58/100 23/23 [==============================] - 1s 22ms/step - loss: 0.3550 - acc: 0.8838 - val_loss: 0.8973 - val_acc: 0.7422 Epoch 59/100 23/23 [==============================] - 1s 23ms/step - loss: 0.3470 - acc: 0.8934 - val_loss: 0.8956 - val_acc: 0.7422 Epoch 60/100 23/23 [==============================] - 1s 21ms/step - loss: 0.3366 - acc: 0.8962 - val_loss: 0.9065 - val_acc: 0.7422 Epoch 61/100 23/23 [==============================] - 1s 21ms/step - loss: 0.3438 - acc: 0.8866 - val_loss: 0.8954 - val_acc: 0.7405 Epoch 62/100 23/23 [==============================] - 1s 21ms/step - loss: 0.3410 - acc: 0.8921 - val_loss: 0.8983 - val_acc: 0.7439 Epoch 63/100 23/23 [==============================] - 1s 22ms/step - loss: 0.3396 - acc: 0.8879 - val_loss: 0.8945 - val_acc: 0.7422 Epoch 64/100 23/23 [==============================] - 1s 22ms/step - loss: 0.3360 - acc: 0.8886 - val_loss: 0.8960 - val_acc: 0.7405 Epoch 65/100 23/23 [==============================] - 1s 23ms/step - loss: 0.3380 - acc: 0.8883 - val_loss: 0.9105 - val_acc: 0.7405 Epoch 66/100 23/23 [==============================] - 1s 22ms/step - loss: 0.3347 - acc: 0.8959 - val_loss: 0.8945 - val_acc: 0.7422 Epoch 67/100 23/23 [==============================] - 1s 22ms/step - loss: 0.3318 - acc: 0.8934 - val_loss: 0.8923 - val_acc: 0.7439 Epoch 68/100 23/23 [==============================] - 1s 21ms/step - loss: 0.3428 - acc: 0.8845 - val_loss: 0.9050 - val_acc: 0.7422 Epoch 69/100 23/23 [==============================] - 1s 22ms/step - loss: 0.3242 - acc: 0.9003 - val_loss: 0.8917 - val_acc: 0.7457 Epoch 70/100 23/23 [==============================] - 1s 20ms/step - loss: 0.3278 - acc: 0.8924 - val_loss: 0.8958 - val_acc: 0.7457 Epoch 71/100 23/23 [==============================] - 1s 22ms/step - loss: 0.3268 - acc: 0.8945 - val_loss: 0.8980 - val_acc: 0.7439 Epoch 72/100 23/23 [==============================] - 1s 21ms/step - loss: 0.3211 - acc: 0.8928 - val_loss: 0.8998 - val_acc: 0.7474 Epoch 73/100 23/23 [==============================] - 0s 20ms/step - loss: 0.3285 - acc: 0.8938 - val_loss: 0.9089 - val_acc: 0.7439 Epoch 74/100 23/23 [==============================] - 0s 20ms/step - loss: 0.3165 - acc: 0.8983 - val_loss: 0.9051 - val_acc: 0.7439 Epoch 75/100 23/23 [==============================] - 1s 21ms/step - loss: 0.3126 - acc: 0.8990 - val_loss: 0.9030 - val_acc: 0.7457 Epoch 76/100 23/23 [==============================] - 1s 21ms/step - loss: 0.3213 - acc: 0.8928 - val_loss: 0.9032 - val_acc: 0.7422 Epoch 77/100 23/23 [==============================] - 1s 21ms/step - loss: 0.3150 - acc: 0.8979 - val_loss: 0.9157 - val_acc: 0.7474 Epoch 78/100 23/23 [==============================] - 1s 21ms/step - loss: 0.3214 - acc: 0.8917 - val_loss: 0.9122 - val_acc: 0.7474 Epoch 79/100 23/23 [==============================] - 1s 22ms/step - loss: 0.3158 - acc: 0.8952 - val_loss: 0.9029 - val_acc: 0.7422 Epoch 80/100 23/23 [==============================] - 1s 20ms/step - loss: 0.3083 - acc: 0.8979 - val_loss: 0.9099 - val_acc: 0.7457 Epoch 81/100 23/23 [==============================] - 1s 21ms/step - loss: 0.3090 - acc: 0.8993 - val_loss: 0.9008 - val_acc: 0.7422 Epoch 82/100 23/23 [==============================] - 1s 23ms/step - loss: 0.3078 - acc: 0.8983 - val_loss: 0.9053 - val_acc: 0.7509 Epoch 83/100 23/23 [==============================] - 1s 22ms/step - loss: 0.3043 - acc: 0.9045 - val_loss: 0.8997 - val_acc: 0.7491 Epoch 84/100 23/23 [==============================] - 1s 22ms/step - loss: 0.3094 - acc: 0.8959 - val_loss: 0.9109 - val_acc: 0.7474 Epoch 85/100 23/23 [==============================] - 1s 22ms/step - loss: 0.3084 - acc: 0.8955 - val_loss: 0.9090 - val_acc: 0.7491 Epoch 86/100 23/23 [==============================] - 1s 21ms/step - loss: 0.3097 - acc: 0.8941 - val_loss: 0.9135 - val_acc: 0.7474 Epoch 87/100 23/23 [==============================] - 1s 21ms/step - loss: 0.3012 - acc: 0.9034 - val_loss: 0.9084 - val_acc: 0.7491 Epoch 88/100 23/23 [==============================] - 1s 22ms/step - loss: 0.3043 - acc: 0.8976 - val_loss: 0.9100 - val_acc: 0.7491 Epoch 89/100 23/23 [==============================] - 1s 23ms/step - loss: 0.3046 - acc: 0.8931 - val_loss: 0.9172 - val_acc: 0.7491 Epoch 90/100 23/23 [==============================] - 1s 22ms/step - loss: 0.3082 - acc: 0.9000 - val_loss: 0.8952 - val_acc: 0.7509 Epoch 91/100 23/23 [==============================] - 1s 21ms/step - loss: 0.3016 - acc: 0.9034 - val_loss: 0.9103 - val_acc: 0.7491 Epoch 92/100 23/23 [==============================] - 1s 21ms/step - loss: 0.2935 - acc: 0.9069 - val_loss: 0.8960 - val_acc: 0.7526 Epoch 93/100 23/23 [==============================] - 1s 22ms/step - loss: 0.3029 - acc: 0.8997 - val_loss: 0.9113 - val_acc: 0.7509 Epoch 94/100 23/23 [==============================] - 1s 21ms/step - loss: 0.2934 - acc: 0.9055 - val_loss: 0.9062 - val_acc: 0.7491 Epoch 95/100 23/23 [==============================] - 1s 23ms/step - loss: 0.3002 - acc: 0.8962 - val_loss: 0.9024 - val_acc: 0.7491 Epoch 96/100 23/23 [==============================] - 1s 21ms/step - loss: 0.2861 - acc: 0.9086 - val_loss: 0.8953 - val_acc: 0.7509 Epoch 97/100 23/23 [==============================] - 1s 21ms/step - loss: 0.2974 - acc: 0.9003 - val_loss: 0.9081 - val_acc: 0.7509 Epoch 98/100 23/23 [==============================] - 1s 22ms/step - loss: 0.3051 - acc: 0.9017 - val_loss: 0.8986 - val_acc: 0.7526 Epoch 99/100 23/23 [==============================] - 1s 21ms/step - loss: 0.2872 - acc: 0.9055 - val_loss: 0.9030 - val_acc: 0.7509 Epoch 100/100 23/23 [==============================] - 1s 21ms/step - loss: 0.2902 - acc: 0.9066 - val_loss: 0.9053 - val_acc: 0.7509
The accuracy looks good but it's important to run the evaluation step on the test data and vefify your model achieved good results on unseed data.
print('Evaluating the model')
model.evaluate(test_data)
Evaluating the model 28/28 [==============================] - 5s 156ms/step - loss: 0.7979 - acc: 0.7887 [0.797942042350769, 0.788748562335968]
Understanding your model
When training a classifier, it's useful to see the confusion matrix. The confusion matrix gives you detailed knowledge of how your classifier is performing on test data.
Model Maker already creates the confusion matrix for you.
def show_confusion_matrix(confusion, test_labels):
"""Compute confusion matrix and normalize."""
confusion_normalized = confusion.astype("float") / confusion.sum(axis=1)
axis_labels = test_labels
ax = sns.heatmap(
confusion_normalized, xticklabels=axis_labels, yticklabels=axis_labels,
cmap='Blues', annot=True, fmt='.2f', square=True)
plt.title("Confusion matrix")
plt.ylabel("True label")
plt.xlabel("Predicted label")
confusion_matrix = model.confusion_matrix(test_data)
show_confusion_matrix(confusion_matrix.numpy(), test_data.index_to_label)
Testing the model [Optional]
You can try the model on a sample audio from the test dataset just to see the results.
First you get the serving model.
serving_model = model.create_serving_model()
print(f'Model\'s input shape and type: {serving_model.inputs}')
print(f'Model\'s output shape and type: {serving_model.outputs}')
Model's input shape and type: [<KerasTensor: shape=(None, 15600) dtype=float32 (created by layer 'audio')>] Model's output shape and type: [<KerasTensor: shape=(None, 521) dtype=float32 (created by layer 'keras_layer')>, <KerasTensor: shape=(None, 5) dtype=float32 (created by layer 'sequential')>]
Coming back to the random audio you loaded earlier
# if you want to try another file just uncoment the line below
random_audio = get_random_audio_file()
show_bird_data(random_audio)
Bird name: Azara's Spinetail Bird code: azaspi1
The model created has a fixed input window.
For a given audio file, you'll have to split it in windows of data of the expected size. The last window might need to be filled with zeros.
sample_rate, audio_data = wavfile.read(random_audio, 'rb')
audio_data = np.array(audio_data) / tf.int16.max
input_size = serving_model.input_shape[1]
splitted_audio_data = tf.signal.frame(audio_data, input_size, input_size, pad_end=True, pad_value=0)
print(f'Test audio path: {random_audio}')
print(f'Original size of the audio data: {len(audio_data)}')
print(f'Number of windows for inference: {len(splitted_audio_data)}')
Test audio path: /tmpfs/src/temp/tensorflow/lite/g3doc/tutorials/dataset/small_birds_dataset/test/azaspi1/XC7993.wav Original size of the audio data: 467520 Number of windows for inference: 30
You'll loop over all the splitted audio and apply the model for each one of them.
The model you've just trained has 2 outputs: The original YAMNet's output and the one you've just trained. This is important because the real world environment is more complicated than just bird sounds. You can use the YAMNet's output to filter out non relevant audio, for example, on the birds use case, if YAMNet is not classifying Birds or Animals, this might show that the output from your model might have an irrelevant classification.
Below both outpus are printed to make it easier to understand their relation. Most of the mistakes that your model make are when YAMNet's prediction is not related to your domain (eg: birds).
print(random_audio)
results = []
print('Result of the window ith: your model class -> score, (spec class -> score)')
for i, data in enumerate(splitted_audio_data):
yamnet_output, inference = serving_model(data)
results.append(inference[0].numpy())
result_index = tf.argmax(inference[0])
spec_result_index = tf.argmax(yamnet_output[0])
t = spec._yamnet_labels()[spec_result_index]
result_str = f'Result of the window {i}: ' \
f'\t{test_data.index_to_label[result_index]} -> {inference[0][result_index].numpy():.3f}, ' \
f'\t({spec._yamnet_labels()[spec_result_index]} -> {yamnet_output[0][spec_result_index]:.3f})'
print(result_str)
results_np = np.array(results)
mean_results = results_np.mean(axis=0)
result_index = mean_results.argmax()
print(f'Mean result: {test_data.index_to_label[result_index]} -> {mean_results[result_index]}')
/tmpfs/src/temp/tensorflow/lite/g3doc/tutorials/dataset/small_birds_dataset/test/azaspi1/XC7993.wav Result of the window ith: your model class -> score, (spec class -> score) Result of the window 0: azaspi1 -> 0.860, (Animal -> 0.862) Result of the window 1: chcant2 -> 0.702, (Outside, rural or natural -> 0.346) Result of the window 2: azaspi1 -> 0.998, (Animal -> 0.870) Result of the window 3: chcant2 -> 0.944, (Pink noise -> 0.634) Result of the window 4: chcant2 -> 0.931, (Pink noise -> 0.744) Result of the window 5: azaspi1 -> 0.827, (Animal -> 0.798) Result of the window 6: chcant2 -> 0.958, (Pink noise -> 0.505) Result of the window 7: azaspi1 -> 0.972, (Animal -> 0.782) Result of the window 8: chcant2 -> 0.889, (Pink noise -> 0.451) Result of the window 9: chcant2 -> 0.961, (Pink noise -> 0.598) Result of the window 10: houspa -> 0.472, (Animal -> 0.836) Result of the window 11: azaspi1 -> 0.936, (Animal -> 0.807) Result of the window 12: azaspi1 -> 0.641, (Animal -> 0.727) Result of the window 13: azaspi1 -> 1.000, (Animal -> 0.835) Result of the window 14: chcant2 -> 0.868, (Pink noise -> 0.748) Result of the window 15: azaspi1 -> 0.986, (Animal -> 0.922) Result of the window 16: chcant2 -> 0.820, (Outside, rural or natural -> 0.480) Result of the window 17: chcant2 -> 0.934, (Pink noise -> 0.717) Result of the window 18: chcant2 -> 0.787, (Pink noise -> 0.533) Result of the window 19: azaspi1 -> 0.969, (Animal -> 0.807) Result of the window 20: chcant2 -> 0.970, (Pink noise -> 0.629) Result of the window 21: azaspi1 -> 0.913, (Animal -> 0.676) Result of the window 22: redcro -> 0.469, (Bird -> 0.928) Result of the window 23: azaspi1 -> 0.860, (Animal -> 0.874) Result of the window 24: redcro -> 0.455, (Animal -> 0.727) Result of the window 25: houspa -> 0.506, (Animal -> 0.912) Result of the window 26: azaspi1 -> 0.987, (Animal -> 0.775) Result of the window 27: houspa -> 0.567, (Animal -> 0.632) Result of the window 28: azaspi1 -> 0.973, (Animal -> 0.623) Result of the window 29: chcant2 -> 0.931, (Stream -> 0.587) Mean result: azaspi1 -> 0.44704481959342957
Exporting the model
The last step is exporting your model to be used on embedded devices or on the browser.
The export
method export both formats for you.
models_path = './birds_models'
print(f'Exporing the TFLite model to {models_path}')
model.export(models_path, tflite_filename='my_birds_model.tflite')
Exporing the TFLite model to ./birds_models 2022-05-10 23:53:00.408047: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them. INFO:tensorflow:Assets written to: /tmpfs/tmp/tmpcl003gh3/assets INFO:tensorflow:Assets written to: /tmpfs/tmp/tmpcl003gh3/assets 2022-05-10 23:53:08.839805: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:357] Ignored output_format. 2022-05-10 23:53:08.839856: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:360] Ignored drop_control_dependency. INFO:tensorflow:TensorFlow Lite model exported successfully: ./birds_models/my_birds_model.tflite INFO:tensorflow:TensorFlow Lite model exported successfully: ./birds_models/my_birds_model.tflite
You can also export the SavedModel version for serving or using on a Python environment.
model.export(models_path, export_format=[mm.ExportFormat.SAVED_MODEL, mm.ExportFormat.LABEL])
WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model. WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model. INFO:tensorflow:Assets written to: ./birds_models/saved_model/assets INFO:tensorflow:Assets written to: ./birds_models/saved_model/assets INFO:tensorflow:Saving labels in ./birds_models/labels.txt INFO:tensorflow:Saving labels in ./birds_models/labels.txt
Next Steps
You did it.
Now your new model can be deployed on mobile devices using TFLite AudioClassifier Task API.
You can also try the same process with your own data with different classes and here is the documentation for Model Maker for Audio Classification.