![]() |
![]() |
![]() |
![]() |
![]() |
In this colab notebook, you'll learn how to use the TensorFlow Lite Model Maker to train a custom audio classification model.
The Model Maker library uses transfer learning to simplify the process of training a TensorFlow Lite model using a custom dataset. Retraining a TensorFlow Lite model with your own custom dataset reduces the amount of training data and time required.
It is part of the Codelab to Customize an Audio model and deploy on Android.
You'll use a custom birds dataset and export a TFLite model that can be used on a phone, a TensorFlow.JS model that can be used for inference in the browser and also a SavedModel version that you can use for serving.
Intalling dependencies
sudo apt -y install libportaudio2
pip install tflite-model-maker
Import TensorFlow, Model Maker and other libraries
Among the dependencies that are needed, you'll use TensorFlow and Model Maker. Aside those, the others are for audio manipulation, playing and visualizations.
import tensorflow as tf
import tflite_model_maker as mm
from tflite_model_maker import audio_classifier
import os
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import itertools
import glob
import random
from IPython.display import Audio, Image
from scipy.io import wavfile
print(f"TensorFlow Version: {tf.__version__}")
print(f"Model Maker Version: {mm.__version__}")
TensorFlow Version: 2.9.1 Model Maker Version: 0.4.0
The Birds dataset
The Birds dataset is an education collection of 5 types of birds songs:
- White-breasted Wood-Wren
- House Sparrow
- Red Crossbill
- Chestnut-crowned Antpitta
- Azara's Spinetail
The original audio came from Xeno-canto which is a website dedicated to sharing bird sounds from all over the world.
Let's start by downloading the data.
birds_dataset_folder = tf.keras.utils.get_file('birds_dataset.zip',
'https://storage.googleapis.com/laurencemoroney-blog.appspot.com/birds_dataset.zip',
cache_dir='./',
cache_subdir='dataset',
extract=True)
Downloading data from https://storage.googleapis.com/laurencemoroney-blog.appspot.com/birds_dataset.zip 343680986/343680986 [==============================] - 3s 0us/step
Explore the data
The audios are already split in train and test folders. Inside each split folder, there's one folder for each bird, using their bird_code
as name.
The audios are all mono and with 16kHz sample rate.
For more information about each file, you can read the metadata.csv
file. It contains all the files authors, lincenses and some more information. You won't need to read it yourself on this tutorial.
# @title [Run this] Util functions and data structures.
data_dir = './dataset/small_birds_dataset'
bird_code_to_name = {
'wbwwre1': 'White-breasted Wood-Wren',
'houspa': 'House Sparrow',
'redcro': 'Red Crossbill',
'chcant2': 'Chestnut-crowned Antpitta',
'azaspi1': "Azara's Spinetail",
}
birds_images = {
'wbwwre1': 'https://upload.wikimedia.org/wikipedia/commons/thumb/2/22/Henicorhina_leucosticta_%28Cucarachero_pechiblanco%29_-_Juvenil_%2814037225664%29.jpg/640px-Henicorhina_leucosticta_%28Cucarachero_pechiblanco%29_-_Juvenil_%2814037225664%29.jpg', # Alejandro Bayer Tamayo from Armenia, Colombia
'houspa': 'https://upload.wikimedia.org/wikipedia/commons/thumb/5/52/House_Sparrow%2C_England_-_May_09.jpg/571px-House_Sparrow%2C_England_-_May_09.jpg', # Diliff
'redcro': 'https://upload.wikimedia.org/wikipedia/commons/thumb/4/49/Red_Crossbills_%28Male%29.jpg/640px-Red_Crossbills_%28Male%29.jpg', # Elaine R. Wilson, www.naturespicsonline.com
'chcant2': 'https://upload.wikimedia.org/wikipedia/commons/thumb/6/67/Chestnut-crowned_antpitta_%2846933264335%29.jpg/640px-Chestnut-crowned_antpitta_%2846933264335%29.jpg', # Mike's Birds from Riverside, CA, US
'azaspi1': 'https://upload.wikimedia.org/wikipedia/commons/thumb/b/b2/Synallaxis_azarae_76608368.jpg/640px-Synallaxis_azarae_76608368.jpg', # https://www.inaturalist.org/photos/76608368
}
test_files = os.path.abspath(os.path.join(data_dir, 'test/*/*.wav'))
def get_random_audio_file():
test_list = glob.glob(test_files)
random_audio_path = random.choice(test_list)
return random_audio_path
def show_bird_data(audio_path):
sample_rate, audio_data = wavfile.read(audio_path, 'rb')
bird_code = audio_path.split('/')[-2]
print(f'Bird name: {bird_code_to_name[bird_code]}')
print(f'Bird code: {bird_code}')
display(Image(birds_images[bird_code]))
plttitle = f'{bird_code_to_name[bird_code]} ({bird_code})'
plt.title(plttitle)
plt.plot(audio_data)
display(Audio(audio_data, rate=sample_rate))
print('functions and data structures created')
functions and data structures created
Playing some audio
To have a better understanding about the data, lets listen to a random audio files from the test split.
random_audio = get_random_audio_file()
show_bird_data(random_audio)
Bird name: Azara's Spinetail Bird code: azaspi1
Training the Model
When using Model Maker for audio, you have to start with a model spec. This is the base model that your new model will extract information to learn about the new classes. It also affects how the dataset will be transformed to respect the models spec parameters like: sample rate, number of channels.
YAMNet is an audio event classifier trained on the AudioSet dataset to predict audio events from the AudioSet ontology.
It's input is expected to be at 16kHz and with 1 channel.
You don't need to do any resampling yourself. Model Maker takes care of that for you.
frame_length
is to decide how long each traininng sample is. in this caase EXPECTED_WAVEFORM_LENGTH * 3sframe_steps
is to decide how far appart are the training samples. In this case, the ith sample will start at EXPECTED_WAVEFORM_LENGTH * 6s after the (i-1)th sample.
The reason to set these values is to work around some limitation in real world dataset.
For example, in the bird dataset, birds don't sing all the time. They sing, rest and sing again, with noises in between. Having a long frame would help capture the singing, but setting it too long will reduce the number of samples for training.
spec = audio_classifier.YamNetSpec(
keep_yamnet_and_custom_heads=True,
frame_step=3 * audio_classifier.YamNetSpec.EXPECTED_WAVEFORM_LENGTH,
frame_length=6 * audio_classifier.YamNetSpec.EXPECTED_WAVEFORM_LENGTH)
INFO:tensorflow:Checkpoints are stored in /tmpfs/tmp/tmpoi1i1xc7
Loading the data
Model Maker has the API to load the data from a folder and have it in the expected format for the model spec.
The train and test split are based on the folders. The validation dataset will be created as 20% of the train split.
train_data = audio_classifier.DataLoader.from_folder(
spec, os.path.join(data_dir, 'train'), cache=True)
train_data, validation_data = train_data.split(0.8)
test_data = audio_classifier.DataLoader.from_folder(
spec, os.path.join(data_dir, 'test'), cache=True)
Training the model
the audio_classifier has the create
method that creates a model and already start training it.
You can customize many parameterss, for more information you can read more details in the documentation.
On this first try you'll use all the default configurations and train for 100 epochs.
batch_size = 128
epochs = 100
print('Training the model')
model = audio_classifier.create(
train_data,
spec,
validation_data,
batch_size=batch_size,
epochs=epochs)
Training the model Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= classification_head (Dense) (None, 5) 5125 ================================================================= Total params: 5,125 Trainable params: 5,125 Non-trainable params: 0 _________________________________________________________________ Epoch 1/100 22/22 [==============================] - 19s 766ms/step - loss: 1.6695 - acc: 0.2116 - val_loss: 1.4794 - val_acc: 0.3480 Epoch 2/100 22/22 [==============================] - 0s 10ms/step - loss: 1.4015 - acc: 0.3965 - val_loss: 1.2727 - val_acc: 0.6364 Epoch 3/100 22/22 [==============================] - 0s 11ms/step - loss: 1.2427 - acc: 0.5195 - val_loss: 1.1172 - val_acc: 0.7131 Epoch 4/100 22/22 [==============================] - 0s 11ms/step - loss: 1.1184 - acc: 0.6103 - val_loss: 1.0111 - val_acc: 0.7543 Epoch 5/100 22/22 [==============================] - 0s 11ms/step - loss: 1.0331 - acc: 0.6629 - val_loss: 0.9327 - val_acc: 0.7699 Epoch 6/100 22/22 [==============================] - 0s 12ms/step - loss: 0.9552 - acc: 0.6911 - val_loss: 0.8741 - val_acc: 0.7770 Epoch 7/100 22/22 [==============================] - 0s 11ms/step - loss: 0.9038 - acc: 0.7163 - val_loss: 0.8244 - val_acc: 0.7827 Epoch 8/100 22/22 [==============================] - 0s 13ms/step - loss: 0.8673 - acc: 0.7199 - val_loss: 0.7859 - val_acc: 0.7940 Epoch 9/100 22/22 [==============================] - 0s 15ms/step - loss: 0.8205 - acc: 0.7354 - val_loss: 0.7507 - val_acc: 0.8026 Epoch 10/100 22/22 [==============================] - 0s 15ms/step - loss: 0.7849 - acc: 0.7523 - val_loss: 0.7199 - val_acc: 0.8111 Epoch 11/100 22/22 [==============================] - 0s 11ms/step - loss: 0.7608 - acc: 0.7545 - val_loss: 0.6991 - val_acc: 0.8168 Epoch 12/100 22/22 [==============================] - 0s 11ms/step - loss: 0.7405 - acc: 0.7556 - val_loss: 0.6718 - val_acc: 0.8253 Epoch 13/100 22/22 [==============================] - 0s 11ms/step - loss: 0.7047 - acc: 0.7826 - val_loss: 0.6537 - val_acc: 0.8281 Epoch 14/100 22/22 [==============================] - 0s 11ms/step - loss: 0.6927 - acc: 0.7725 - val_loss: 0.6376 - val_acc: 0.8352 Epoch 15/100 22/22 [==============================] - 0s 11ms/step - loss: 0.6727 - acc: 0.7891 - val_loss: 0.6181 - val_acc: 0.8438 Epoch 16/100 22/22 [==============================] - 0s 12ms/step - loss: 0.6572 - acc: 0.7891 - val_loss: 0.6035 - val_acc: 0.8494 Epoch 17/100 22/22 [==============================] - 0s 13ms/step - loss: 0.6375 - acc: 0.7963 - val_loss: 0.5917 - val_acc: 0.8594 Epoch 18/100 22/22 [==============================] - 0s 13ms/step - loss: 0.6294 - acc: 0.7996 - val_loss: 0.5801 - val_acc: 0.8636 Epoch 19/100 22/22 [==============================] - 0s 11ms/step - loss: 0.6112 - acc: 0.8014 - val_loss: 0.5646 - val_acc: 0.8665 Epoch 20/100 22/22 [==============================] - 0s 13ms/step - loss: 0.5948 - acc: 0.8143 - val_loss: 0.5581 - val_acc: 0.8636 Epoch 21/100 22/22 [==============================] - 0s 11ms/step - loss: 0.5948 - acc: 0.8089 - val_loss: 0.5465 - val_acc: 0.8707 Epoch 22/100 22/22 [==============================] - 0s 12ms/step - loss: 0.5743 - acc: 0.8147 - val_loss: 0.5408 - val_acc: 0.8750 Epoch 23/100 22/22 [==============================] - 0s 11ms/step - loss: 0.5750 - acc: 0.8046 - val_loss: 0.5306 - val_acc: 0.8750 Epoch 24/100 22/22 [==============================] - 0s 11ms/step - loss: 0.5711 - acc: 0.8194 - val_loss: 0.5222 - val_acc: 0.8821 Epoch 25/100 22/22 [==============================] - 0s 12ms/step - loss: 0.5501 - acc: 0.8176 - val_loss: 0.5131 - val_acc: 0.8878 Epoch 26/100 22/22 [==============================] - 0s 11ms/step - loss: 0.5495 - acc: 0.8230 - val_loss: 0.5071 - val_acc: 0.8892 Epoch 27/100 22/22 [==============================] - 0s 11ms/step - loss: 0.5368 - acc: 0.8241 - val_loss: 0.5030 - val_acc: 0.8935 Epoch 28/100 22/22 [==============================] - 0s 12ms/step - loss: 0.5300 - acc: 0.8320 - val_loss: 0.4943 - val_acc: 0.8935 Epoch 29/100 22/22 [==============================] - 0s 11ms/step - loss: 0.5201 - acc: 0.8284 - val_loss: 0.4883 - val_acc: 0.8977 Epoch 30/100 22/22 [==============================] - 0s 11ms/step - loss: 0.5212 - acc: 0.8320 - val_loss: 0.4809 - val_acc: 0.8991 Epoch 31/100 22/22 [==============================] - 0s 11ms/step - loss: 0.5143 - acc: 0.8353 - val_loss: 0.4771 - val_acc: 0.8977 Epoch 32/100 22/22 [==============================] - 0s 13ms/step - loss: 0.5020 - acc: 0.8356 - val_loss: 0.4752 - val_acc: 0.9006 Epoch 33/100 22/22 [==============================] - 0s 12ms/step - loss: 0.5018 - acc: 0.8291 - val_loss: 0.4671 - val_acc: 0.9006 Epoch 34/100 22/22 [==============================] - 0s 12ms/step - loss: 0.4881 - acc: 0.8381 - val_loss: 0.4632 - val_acc: 0.9020 Epoch 35/100 22/22 [==============================] - 0s 13ms/step - loss: 0.4970 - acc: 0.8324 - val_loss: 0.4606 - val_acc: 0.9020 Epoch 36/100 22/22 [==============================] - 0s 11ms/step - loss: 0.4840 - acc: 0.8396 - val_loss: 0.4540 - val_acc: 0.9077 Epoch 37/100 22/22 [==============================] - 0s 11ms/step - loss: 0.4797 - acc: 0.8468 - val_loss: 0.4514 - val_acc: 0.9062 Epoch 38/100 22/22 [==============================] - 0s 13ms/step - loss: 0.4690 - acc: 0.8417 - val_loss: 0.4490 - val_acc: 0.9091 Epoch 39/100 22/22 [==============================] - 0s 11ms/step - loss: 0.4653 - acc: 0.8493 - val_loss: 0.4456 - val_acc: 0.9077 Epoch 40/100 22/22 [==============================] - 0s 12ms/step - loss: 0.4649 - acc: 0.8472 - val_loss: 0.4381 - val_acc: 0.9105 Epoch 41/100 22/22 [==============================] - 0s 11ms/step - loss: 0.4626 - acc: 0.8464 - val_loss: 0.4377 - val_acc: 0.9119 Epoch 42/100 22/22 [==============================] - 0s 11ms/step - loss: 0.4578 - acc: 0.8461 - val_loss: 0.4350 - val_acc: 0.9134 Epoch 43/100 22/22 [==============================] - 0s 12ms/step - loss: 0.4490 - acc: 0.8551 - val_loss: 0.4310 - val_acc: 0.9134 Epoch 44/100 22/22 [==============================] - 0s 13ms/step - loss: 0.4489 - acc: 0.8500 - val_loss: 0.4288 - val_acc: 0.9105 Epoch 45/100 22/22 [==============================] - 0s 12ms/step - loss: 0.4499 - acc: 0.8529 - val_loss: 0.4273 - val_acc: 0.9134 Epoch 46/100 22/22 [==============================] - 0s 11ms/step - loss: 0.4456 - acc: 0.8508 - val_loss: 0.4222 - val_acc: 0.9119 Epoch 47/100 22/22 [==============================] - 0s 12ms/step - loss: 0.4388 - acc: 0.8522 - val_loss: 0.4194 - val_acc: 0.9148 Epoch 48/100 22/22 [==============================] - 0s 11ms/step - loss: 0.4332 - acc: 0.8601 - val_loss: 0.4164 - val_acc: 0.9176 Epoch 49/100 22/22 [==============================] - 0s 12ms/step - loss: 0.4245 - acc: 0.8619 - val_loss: 0.4142 - val_acc: 0.9148 Epoch 50/100 22/22 [==============================] - 0s 13ms/step - loss: 0.4307 - acc: 0.8569 - val_loss: 0.4135 - val_acc: 0.9119 Epoch 51/100 22/22 [==============================] - 0s 12ms/step - loss: 0.4236 - acc: 0.8576 - val_loss: 0.4118 - val_acc: 0.9148 Epoch 52/100 22/22 [==============================] - 0s 11ms/step - loss: 0.4194 - acc: 0.8659 - val_loss: 0.4088 - val_acc: 0.9148 Epoch 53/100 22/22 [==============================] - 0s 12ms/step - loss: 0.4220 - acc: 0.8619 - val_loss: 0.4058 - val_acc: 0.9148 Epoch 54/100 22/22 [==============================] - 0s 11ms/step - loss: 0.4242 - acc: 0.8594 - val_loss: 0.4038 - val_acc: 0.9148 Epoch 55/100 22/22 [==============================] - 0s 12ms/step - loss: 0.4156 - acc: 0.8630 - val_loss: 0.4047 - val_acc: 0.9134 Epoch 56/100 22/22 [==============================] - 0s 12ms/step - loss: 0.4113 - acc: 0.8731 - val_loss: 0.3991 - val_acc: 0.9134 Epoch 57/100 22/22 [==============================] - 0s 11ms/step - loss: 0.4056 - acc: 0.8641 - val_loss: 0.3974 - val_acc: 0.9134 Epoch 58/100 22/22 [==============================] - 0s 11ms/step - loss: 0.3994 - acc: 0.8684 - val_loss: 0.3969 - val_acc: 0.9119 Epoch 59/100 22/22 [==============================] - 0s 12ms/step - loss: 0.4112 - acc: 0.8637 - val_loss: 0.3944 - val_acc: 0.9105 Epoch 60/100 22/22 [==============================] - 0s 12ms/step - loss: 0.4077 - acc: 0.8616 - val_loss: 0.3954 - val_acc: 0.9134 Epoch 61/100 22/22 [==============================] - 0s 10ms/step - loss: 0.3998 - acc: 0.8612 - val_loss: 0.3927 - val_acc: 0.9119 Epoch 62/100 22/22 [==============================] - 0s 11ms/step - loss: 0.4066 - acc: 0.8580 - val_loss: 0.3913 - val_acc: 0.9105 Epoch 63/100 22/22 [==============================] - 0s 10ms/step - loss: 0.3976 - acc: 0.8634 - val_loss: 0.3891 - val_acc: 0.9091 Epoch 64/100 22/22 [==============================] - 0s 12ms/step - loss: 0.3954 - acc: 0.8695 - val_loss: 0.3877 - val_acc: 0.9119 Epoch 65/100 22/22 [==============================] - 0s 10ms/step - loss: 0.3987 - acc: 0.8688 - val_loss: 0.3838 - val_acc: 0.9119 Epoch 66/100 22/22 [==============================] - 0s 12ms/step - loss: 0.3954 - acc: 0.8695 - val_loss: 0.3846 - val_acc: 0.9119 Epoch 67/100 22/22 [==============================] - 0s 11ms/step - loss: 0.3840 - acc: 0.8738 - val_loss: 0.3811 - val_acc: 0.9091 Epoch 68/100 22/22 [==============================] - 0s 11ms/step - loss: 0.3800 - acc: 0.8745 - val_loss: 0.3844 - val_acc: 0.9091 Epoch 69/100 22/22 [==============================] - 0s 11ms/step - loss: 0.3878 - acc: 0.8767 - val_loss: 0.3812 - val_acc: 0.9105 Epoch 70/100 22/22 [==============================] - 0s 12ms/step - loss: 0.3946 - acc: 0.8663 - val_loss: 0.3779 - val_acc: 0.9162 Epoch 71/100 22/22 [==============================] - 0s 11ms/step - loss: 0.3811 - acc: 0.8738 - val_loss: 0.3786 - val_acc: 0.9148 Epoch 72/100 22/22 [==============================] - 0s 11ms/step - loss: 0.3788 - acc: 0.8782 - val_loss: 0.3783 - val_acc: 0.9134 Epoch 73/100 22/22 [==============================] - 0s 11ms/step - loss: 0.3841 - acc: 0.8688 - val_loss: 0.3754 - val_acc: 0.9176 Epoch 74/100 22/22 [==============================] - 0s 12ms/step - loss: 0.3762 - acc: 0.8742 - val_loss: 0.3741 - val_acc: 0.9162 Epoch 75/100 22/22 [==============================] - 0s 11ms/step - loss: 0.3696 - acc: 0.8749 - val_loss: 0.3768 - val_acc: 0.9134 Epoch 76/100 22/22 [==============================] - 0s 11ms/step - loss: 0.3773 - acc: 0.8727 - val_loss: 0.3751 - val_acc: 0.9148 Epoch 77/100 22/22 [==============================] - 0s 11ms/step - loss: 0.3674 - acc: 0.8767 - val_loss: 0.3702 - val_acc: 0.9176 Epoch 78/100 22/22 [==============================] - 0s 11ms/step - loss: 0.3723 - acc: 0.8745 - val_loss: 0.3735 - val_acc: 0.9119 Epoch 79/100 22/22 [==============================] - 0s 13ms/step - loss: 0.3613 - acc: 0.8778 - val_loss: 0.3705 - val_acc: 0.9134 Epoch 80/100 22/22 [==============================] - 0s 11ms/step - loss: 0.3682 - acc: 0.8724 - val_loss: 0.3705 - val_acc: 0.9148 Epoch 81/100 22/22 [==============================] - 0s 12ms/step - loss: 0.3709 - acc: 0.8742 - val_loss: 0.3670 - val_acc: 0.9134 Epoch 82/100 22/22 [==============================] - 0s 11ms/step - loss: 0.3691 - acc: 0.8706 - val_loss: 0.3676 - val_acc: 0.9134 Epoch 83/100 22/22 [==============================] - 0s 12ms/step - loss: 0.3658 - acc: 0.8760 - val_loss: 0.3657 - val_acc: 0.9148 Epoch 84/100 22/22 [==============================] - 0s 12ms/step - loss: 0.3602 - acc: 0.8832 - val_loss: 0.3665 - val_acc: 0.9148 Epoch 85/100 22/22 [==============================] - 0s 11ms/step - loss: 0.3636 - acc: 0.8709 - val_loss: 0.3621 - val_acc: 0.9162 Epoch 86/100 22/22 [==============================] - 0s 11ms/step - loss: 0.3572 - acc: 0.8767 - val_loss: 0.3612 - val_acc: 0.9205 Epoch 87/100 22/22 [==============================] - 0s 11ms/step - loss: 0.3549 - acc: 0.8818 - val_loss: 0.3624 - val_acc: 0.9162 Epoch 88/100 22/22 [==============================] - 0s 11ms/step - loss: 0.3518 - acc: 0.8850 - val_loss: 0.3616 - val_acc: 0.9162 Epoch 89/100 22/22 [==============================] - 0s 11ms/step - loss: 0.3533 - acc: 0.8810 - val_loss: 0.3629 - val_acc: 0.9176 Epoch 90/100 22/22 [==============================] - 0s 11ms/step - loss: 0.3437 - acc: 0.8846 - val_loss: 0.3625 - val_acc: 0.9119 Epoch 91/100 22/22 [==============================] - 0s 12ms/step - loss: 0.3585 - acc: 0.8800 - val_loss: 0.3588 - val_acc: 0.9190 Epoch 92/100 22/22 [==============================] - 0s 11ms/step - loss: 0.3515 - acc: 0.8792 - val_loss: 0.3621 - val_acc: 0.9148 Epoch 93/100 22/22 [==============================] - 0s 11ms/step - loss: 0.3524 - acc: 0.8864 - val_loss: 0.3629 - val_acc: 0.9119 Epoch 94/100 22/22 [==============================] - 0s 11ms/step - loss: 0.3466 - acc: 0.8760 - val_loss: 0.3570 - val_acc: 0.9176 Epoch 95/100 22/22 [==============================] - 0s 11ms/step - loss: 0.3451 - acc: 0.8843 - val_loss: 0.3581 - val_acc: 0.9148 Epoch 96/100 22/22 [==============================] - 0s 11ms/step - loss: 0.3458 - acc: 0.8836 - val_loss: 0.3583 - val_acc: 0.9134 Epoch 97/100 22/22 [==============================] - 0s 12ms/step - loss: 0.3481 - acc: 0.8803 - val_loss: 0.3593 - val_acc: 0.9134 Epoch 98/100 22/22 [==============================] - 0s 12ms/step - loss: 0.3455 - acc: 0.8818 - val_loss: 0.3560 - val_acc: 0.9134 Epoch 99/100 22/22 [==============================] - 0s 10ms/step - loss: 0.3467 - acc: 0.8836 - val_loss: 0.3572 - val_acc: 0.9119 Epoch 100/100 22/22 [==============================] - 0s 12ms/step - loss: 0.3418 - acc: 0.8893 - val_loss: 0.3577 - val_acc: 0.9134
The accuracy looks good but it's important to run the evaluation step on the test data and vefify your model achieved good results on unseed data.
print('Evaluating the model')
model.evaluate(test_data)
Evaluating the model 28/28 [==============================] - 5s 143ms/step - loss: 0.8298 - acc: 0.7646 [0.8297802209854126, 0.764638364315033]
Understanding your model
When training a classifier, it's useful to see the confusion matrix. The confusion matrix gives you detailed knowledge of how your classifier is performing on test data.
Model Maker already creates the confusion matrix for you.
def show_confusion_matrix(confusion, test_labels):
"""Compute confusion matrix and normalize."""
confusion_normalized = confusion.astype("float") / confusion.sum(axis=1)
axis_labels = test_labels
ax = sns.heatmap(
confusion_normalized, xticklabels=axis_labels, yticklabels=axis_labels,
cmap='Blues', annot=True, fmt='.2f', square=True)
plt.title("Confusion matrix")
plt.ylabel("True label")
plt.xlabel("Predicted label")
confusion_matrix = model.confusion_matrix(test_data)
show_confusion_matrix(confusion_matrix.numpy(), test_data.index_to_label)
1/1 [==============================] - 0s 159ms/step 1/1 [==============================] - 0s 58ms/step 1/1 [==============================] - 0s 37ms/step 1/1 [==============================] - 0s 50ms/step 1/1 [==============================] - 0s 45ms/step 1/1 [==============================] - 0s 53ms/step 1/1 [==============================] - 0s 45ms/step 1/1 [==============================] - 0s 53ms/step 1/1 [==============================] - 0s 50ms/step 1/1 [==============================] - 0s 44ms/step 1/1 [==============================] - 0s 42ms/step 1/1 [==============================] - 0s 37ms/step 1/1 [==============================] - 0s 55ms/step 1/1 [==============================] - 0s 48ms/step 1/1 [==============================] - 0s 47ms/step 1/1 [==============================] - 0s 67ms/step 1/1 [==============================] - 0s 48ms/step 1/1 [==============================] - 0s 50ms/step 1/1 [==============================] - 0s 53ms/step 1/1 [==============================] - 0s 58ms/step 1/1 [==============================] - 0s 50ms/step 1/1 [==============================] - 0s 33ms/step 1/1 [==============================] - 0s 51ms/step 1/1 [==============================] - 0s 46ms/step 1/1 [==============================] - 0s 23ms/step 1/1 [==============================] - 0s 20ms/step 1/1 [==============================] - 0s 20ms/step 1/1 [==============================] - 0s 40ms/step
Testing the model [Optional]
You can try the model on a sample audio from the test dataset just to see the results.
First you get the serving model.
serving_model = model.create_serving_model()
print(f'Model\'s input shape and type: {serving_model.inputs}')
print(f'Model\'s output shape and type: {serving_model.outputs}')
Model's input shape and type: [<KerasTensor: shape=(None, 15600) dtype=float32 (created by layer 'audio')>] Model's output shape and type: [<KerasTensor: shape=(None, 521) dtype=float32 (created by layer 'keras_layer')>, <KerasTensor: shape=(None, 5) dtype=float32 (created by layer 'sequential')>]
Coming back to the random audio you loaded earlier
# if you want to try another file just uncoment the line below
random_audio = get_random_audio_file()
show_bird_data(random_audio)
Bird name: White-breasted Wood-Wren Bird code: wbwwre1
The model created has a fixed input window.
For a given audio file, you'll have to split it in windows of data of the expected size. The last window might need to be filled with zeros.
sample_rate, audio_data = wavfile.read(random_audio, 'rb')
audio_data = np.array(audio_data) / tf.int16.max
input_size = serving_model.input_shape[1]
splitted_audio_data = tf.signal.frame(audio_data, input_size, input_size, pad_end=True, pad_value=0)
print(f'Test audio path: {random_audio}')
print(f'Original size of the audio data: {len(audio_data)}')
print(f'Number of windows for inference: {len(splitted_audio_data)}')
Test audio path: /tmpfs/src/temp/tensorflow/lite/g3doc/models/modify/model_maker/dataset/small_birds_dataset/test/wbwwre1/XC525148.wav Original size of the audio data: 291936 Number of windows for inference: 19
You'll loop over all the splitted audio and apply the model for each one of them.
The model you've just trained has 2 outputs: The original YAMNet's output and the one you've just trained. This is important because the real world environment is more complicated than just bird sounds. You can use the YAMNet's output to filter out non relevant audio, for example, on the birds use case, if YAMNet is not classifying Birds or Animals, this might show that the output from your model might have an irrelevant classification.
Below both outpus are printed to make it easier to understand their relation. Most of the mistakes that your model make are when YAMNet's prediction is not related to your domain (eg: birds).
print(random_audio)
results = []
print('Result of the window ith: your model class -> score, (spec class -> score)')
for i, data in enumerate(splitted_audio_data):
yamnet_output, inference = serving_model(data)
results.append(inference[0].numpy())
result_index = tf.argmax(inference[0])
spec_result_index = tf.argmax(yamnet_output[0])
t = spec._yamnet_labels()[spec_result_index]
result_str = f'Result of the window {i}: ' \
f'\t{test_data.index_to_label[result_index]} -> {inference[0][result_index].numpy():.3f}, ' \
f'\t({spec._yamnet_labels()[spec_result_index]} -> {yamnet_output[0][spec_result_index]:.3f})'
print(result_str)
results_np = np.array(results)
mean_results = results_np.mean(axis=0)
result_index = mean_results.argmax()
print(f'Mean result: {test_data.index_to_label[result_index]} -> {mean_results[result_index]}')
/tmpfs/src/temp/tensorflow/lite/g3doc/models/modify/model_maker/dataset/small_birds_dataset/test/wbwwre1/XC525148.wav Result of the window ith: your model class -> score, (spec class -> score) Result of the window 0: wbwwre1 -> 0.854, (Cricket -> 0.777) Result of the window 1: wbwwre1 -> 0.903, (Cricket -> 0.931) Result of the window 2: wbwwre1 -> 0.889, (Bird -> 0.903) Result of the window 3: wbwwre1 -> 0.983, (Animal -> 0.822) Result of the window 4: houspa -> 0.402, (Animal -> 0.446) Result of the window 5: wbwwre1 -> 0.859, (Animal -> 0.963) Result of the window 6: wbwwre1 -> 0.990, (Insect -> 0.824) Result of the window 7: wbwwre1 -> 0.786, (Cricket -> 0.817) Result of the window 8: wbwwre1 -> 0.966, (Cricket -> 0.556) Result of the window 9: wbwwre1 -> 0.978, (Animal -> 0.988) Result of the window 10: wbwwre1 -> 0.741, (Animal -> 0.606) Result of the window 11: wbwwre1 -> 0.972, (Insect -> 0.575) Result of the window 12: wbwwre1 -> 0.848, (Insect -> 0.358) Result of the window 13: wbwwre1 -> 0.888, (Cricket -> 0.639) Result of the window 14: azaspi1 -> 0.460, (Silence -> 1.000) Result of the window 15: redcro -> 0.466, (Silence -> 1.000) Result of the window 16: wbwwre1 -> 0.344, (Speech -> 0.231) Result of the window 17: chcant2 -> 0.945, (Silence -> 1.000) Result of the window 18: chcant2 -> 0.709, (Silence -> 0.950) Mean result: wbwwre1 -> 0.6498250365257263
Exporting the model
The last step is exporting your model to be used on embedded devices or on the browser.
The export
method export both formats for you.
models_path = './birds_models'
print(f'Exporing the TFLite model to {models_path}')
model.export(models_path, tflite_filename='my_birds_model.tflite')
Exporing the TFLite model to ./birds_models INFO:tensorflow:Assets written to: /tmpfs/tmp/tmpxywpi1ab/assets INFO:tensorflow:Assets written to: /tmpfs/tmp/tmpxywpi1ab/assets 2022-08-09 17:10:24.810732: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:362] Ignored output_format. 2022-08-09 17:10:24.810789: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:365] Ignored drop_control_dependency. INFO:tensorflow:TensorFlow Lite model exported successfully: ./birds_models/my_birds_model.tflite INFO:tensorflow:TensorFlow Lite model exported successfully: ./birds_models/my_birds_model.tflite
You can also export the SavedModel version for serving or using on a Python environment.
model.export(models_path, export_format=[mm.ExportFormat.SAVED_MODEL, mm.ExportFormat.LABEL])
WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model. WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model. INFO:tensorflow:Assets written to: ./birds_models/saved_model/assets INFO:tensorflow:Assets written to: ./birds_models/saved_model/assets INFO:tensorflow:Saving labels in ./birds_models/labels.txt INFO:tensorflow:Saving labels in ./birds_models/labels.txt
Next Steps
You did it.
Now your new model can be deployed on mobile devices using TFLite AudioClassifier Task API.
You can also try the same process with your own data with different classes and here is the documentation for Model Maker for Audio Classification.