Using a TensorFlow Lite model in your mobile app requires multiple considerations: you must choose a pre-trained or custom model, convert the model to a TensorFLow Lite format, and finally, integrate the model in your app.
1. Choose a model
Depending on the use case, you can choose one of the popular open-sourced models, such as InceptionV3 or MobileNets, and re-train these models with a custom data set or even build your own custom model.
Use a pre-trained model
MobileNets is a family of mobile-first computer vision models for TensorFlow designed to effectively maximize accuracy, while taking into consideration the restricted resources for on-device or embedded applications. MobileNets are small, low-latency, low-power models parameterized to meet the resource constraints for a variety of uses. They can be used for classification, detection, embeddings, and segmentation—similar to other popular large scale models, such as Inception. Google provides 16 pre-trained ImageNet classification checkpoints for MobileNets that can be used in mobile projects of all sizes.
Inception-v3 is an image recognition model that achieves fairly high accuracy recognizing general objects with 1000 classes, for example, "Zebra", "Dalmatian", and "Dishwasher". The model extracts general features from input images using a convolutional neural network and classifies them based on those features with fully-connected and softmax layers.
On Device Smart Reply is an on-device model that provides one-touch replies for incoming text messages by suggesting contextually relevant messages. The model is built specifically for memory constrained devices, such as watches and phones, and has been successfully used in Smart Replies on Android Wear. Currently, this model is Android-specific.
These pre-trained models are available for download.
Re-train Inception-V3 or MobileNet for a custom data set
These pre-trained models were trained on the ImageNet data set which contains 1000 predefined classes. If these classes are not sufficient for your use case, the model will need to be re-trained. This technique is called transfer learning and starts with a model that has been already trained on a problem, then retrains the model on a similar problem. Deep learning from scratch can take days, but transfer learning is fairly quick. In order to do this, you need to generate a custom data set labeled with the relevant classes.
The TensorFlow for Poets codelab walks through the re-training process step-by-step. The code supports both floating point and quantized inference.
Train a custom model
A developer may choose to train a custom model using Tensorflow (see the
TensorFlow tutorials for examples of building and training
models). If you have already written a model, the first step is to export this
tf.GraphDef file. This is required because some formats do not store the
model structure outside the code, and we must communicate with other parts of
the framework. See
Exporting the Inference Graph
to create file for the custom model.
TensorFlow Lite currently supports a subset of TensorFlow operators. Refer to the TensorFlow Lite & TensorFlow Compatibility Guide for supported operators and their usage. This set of operators will continue to grow in future Tensorflow Lite releases.
2. Convert the model format
The TensorFlow Lite Converter accepts the following file formats:
GraphDefand checkpoint with a signature that labels input and output arguments to a model. See the documentation for converting SavedModels using Python or using the command line.
tf.keras- A HDF5 file containing a model with weights and input and output arguments generated by
tf.Keras. See the documentation for converting HDF5 models using Python or using the command line.
frozen tf.GraphDef— A subclass of
tf.GraphDefthat does not contain variables. A
GraphDefcan be converted to a
frozen GraphDefby taking a checkpoint and a
GraphDef, and converting each variable into a constant using the value retrieved from the checkpoint. Instructions on converting a
tf.GraphDefto a TensorFlow Lite model are described in the next subsection.
Converting a tf.GraphDef
TensorFlow models may be saved as a .pb or .pbtxt
tf.GraphDef file. In order
to convert the
tf.GraphDef file to TensorFlow Lite, the model must first be
frozen. This process involves several file formats including the
tf.GraphDef(.pb or .pbtxt) — A protobuf that represents the TensorFlow training or computation graph. It contains operators, tensors, and variables definitions.
- checkpoint (.ckpt) — Serialized variables from a TensorFlow graph. Since this does not contain a graph structure, it cannot be interpreted by itself.
- TensorFlow Lite model (.tflite) — A serialized FlatBuffer that contains TensorFlow Lite operators and tensors for the TensorFlow Lite interpreter.
You must have checkpoints that contain trained weights. The
only contains the structure of the graph. The process of merging the checkpoint
values with the graph structure is called freezing the graph.
tf.GraphDef and checkpoint files for MobileNet models are available
To freeze the graph, use the following command (changing the arguments):
freeze_graph --input_graph=/tmp/mobilenet_v1_224.pb \ --input_checkpoint=/tmp/checkpoints/mobilenet-10202.ckpt \ --input_binary=true \ --output_graph=/tmp/frozen_mobilenet_v1_224.pb \ --output_node_names=MobileNetV1/Predictions/Reshape_1
input_binary flag to
True when reading a binary protobuf, a
file. Set to
False for a
input_checkpoint to the respective filenames. The
output_node_names may not be obvious outside of the code that built the model.
The easiest way to find them is to visualize the graph, either with
GraphDef is now ready for conversion to the
(.tflite) for use on Android or iOS devices. For Android, the TensorFlow Lite
Converter tool supports both float and quantized models. To convert the frozen
GraphDef to the .tflite format use a command similar to the following:
tflite_convert \ --output_file=/tmp/mobilenet_v1_1.0_224.tflite \ --graph_def_file=/tmp/mobilenet_v1_0.50_128/frozen_graph.pb \ --input_arrays=input \ --output_arrays=MobilenetV1/Predictions/Reshape_1
file used here is available for download. Setting the
output_array arguments is not straightforward. The easiest way to find these
values is to explore the graph using
the arguments for specifying the output nodes for inference in the
Full converter reference
The TensorFlow Lite Converter can be Python or from the command line. This allows you to integrate the conversion step into the model design workflow, ensuring the model is easy to convert to a mobile inference graph.
Graph Visualization tool
bazel run tensorflow/lite/tools:visualize -- model.tflite model_viz.html
This generates an interactive HTML page listing subgraphs, operations, and a graph visualization.
3. Use the TensorFlow Lite model for inference in a mobile app
After completing the prior steps, you should now have a
.tflite model file.
Since Android apps are written in Java and the core TensorFlow library is in C++, a JNI library is provided as an interface. This is only meant for inference—it provides the ability to load a graph, set up inputs, and run the model to calculate outputs.
The Android mobile guide has instructions for
installing TensorFlow on Android and setting up
bazel and Android Studio.
Core ML support
Core ML is a machine learning framework used in Apple products. In addition to using Tensorflow Lite models directly in your applications, you can convert trained Tensorflow models to the CoreML format for use on Apple devices. To use the converter, refer to the Tensorflow-CoreML converter documentation.
ARM32 and ARM64 Linux
Compile Tensorflow Lite for a Raspberry Pi by following the
RPi build instructions Compile Tensorflow Lite for a generic aarch64
board such as Odroid C2, Pine64, NanoPi, and others by following the
ARM64 Linux build instructions This compiles a static
library file (
.a) used to build your app. There are plans for Python bindings
and a demo app.
4. Optimize your model (optional)
There are two options. If you plan to run on CPU, we recommend that you quantize your weights and activation tensors. If the hardware is available, another option is to run on GPU for massively parallelizable workloads.
Compress your model size by lowering the precision of the parameters (i.e. neural network weights) from their training-time 32-bit floating-point representations into much smaller and efficient 8-bit integer ones.
This will execute the heaviest computations fast in lower precision, but the most sensitive ones with higher precision, thus typically resulting in little to no final accuracy losses for the task, yet a significant speed-up over pure floating-point execution.
The post-training quantization technique is integrated into the TensorFlow Lite conversion tool. Getting started is easy: after building your TensorFlow model, simply enable the ‘post_training_quantize’ flag in the TensorFlow Lite conversion tool. Assuming that the saved model is stored in saved_model_dir, the quantized tflite flatbuffer can be generated in command line:
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE] tflite_quant_model = converter.convert()
Run on GPU GPUs are designed to have high throughput for massively parallelizable workloads. Thus, they are well-suited for deep neural nets, which consist of a huge number of operators, each working on some input tensor(s) that can be easily divided into smaller workloads and carried out in parallel, typically resulting in lower latency.
Another benefit with GPU inference is its power efficiency. GPUs carry out the computations in a very efficient and optimized manner, so that they consume less power and generate less heat than when the same task is run on CPUs.