Get started with TensorFlow Lite

TensorFlow Lite provides all the tools you need to convert and run TensorFlow models on mobile, embedded, and IoT devices. The following guide walks through each step of the developer workflow and provides links to further instructions.

1. Choose a model

TensorFlow Lite allows you to run TensorFlow models on a wide range of devices. A TensorFlow model is a data structure that contains the logic and knowledge of a machine learning network trained to solve a particular problem.

There are many ways to obtain a TensorFlow model, from using pre-trained models to training your own. To use a model with TensorFlow Lite it must be converted into a special format. This is explained in section 2, Convert the model.

Use a pre-trained model

The TensorFlow Lite team provides a set of pre-trained models that solve a variety of machine learning problems. These models have been converted to work with TensorFlow Lite and are ready to use in your applications.

The pre-trained models include:

See our full list of pre-trained models in Models.

Models from other sources

There are many other places you can obtain pre-trained TensorFlow models, including TensorFlow Hub. In most cases, these models will not be provided in the TensorFlow Lite format, and you'll have to convert them before use.

Re-train a model (transfer learning)

Transfer learning allows you to take a trained model and re-train it to perform another task. For example, an image classification model could be retrained to recognize new categories of image. Re-training takes less time and requires less data than training a model from scratch.

You can use transfer learning to customize pre-trained models to your application. Learn how to perform transfer learning in the Recognize flowers with TensorFlow codelab.

Train a custom model

If you have designed and trained your own TensorFlow model, or you have trained a model obtained from another source, you should convert it to the TensorFlow Lite format before use.

2. Convert the model

TensorFlow Lite is designed to execute models efficiently on devices. Some of this efficiency comes from the use of a special format for storing models. TensorFlow models must be converted into this format before they can be used by TensorFlow Lite.

Converting models reduces their file size and introduces optimizations that do not affect accuracy. Developers can opt to further reduce file size and increase speed of execution in exchange for some trade-offs. You can use the TensorFlow Lite converter to choose which optimizations to apply.

TensorFlow Lite supports a limited subset of TensorFlow operations, so not all models can be converted. See Ops compatibility for more information.

TensorFlow Lite converter

The TensorFlow Lite converter is a tool that converts trained TensorFlow models into the TensorFlow Lite format. It can also introduce optimizations, which are covered in section 4, Optimize your model.

The converter is available as a Python API. The following example shows a TensorFlow SavedModel being converted into the TensorFlow Lite format:

import tensorflow as tf

converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
tflite_model = converter.convert()
open("converted_model.tflite", "wb").write(tflite_model)

You can convert TensorFlow 2.0 models in a similar way.

The converter can also be used from the command line, but the Python API is recommended.

Options

The converter can convert from a variety of input types.

When converting TensorFlow 1.x models, these are:

When converting TensorFlow 2.x models, these are:

The converter can be configured to apply various optimizations that can improve performance or reduce file size. This is covered in section 4, Optimize your model.

Ops compatibility

TensorFlow Lite currently supports a limited subset of TensorFlow operations. The long term goal is for all TensorFlow operations to be supported.

If the model you wish to convert contains unsupported operations, you can use TensorFlow Select to include operations from TensorFlow. This will result in a larger binary being deployed to devices.

3. Run inference with the model

Inference is the process of running data through a model to obtain predictions. It requires a model, an interpreter, and input data.

TensorFlow Lite interpreter

The TensorFlow Lite interpreter is a library that takes a model file, executes the operations it defines on input data, and provides access to the output.

The interpreter works across multiple platforms and provides a simple API for running TensorFlow Lite models from Java, Swift, Objective-C, C++, and Python.

The following code shows the interpreter being invoked from Java:

try (Interpreter interpreter = new Interpreter(tensorflow_lite_model_file)) {
  interpreter.run(input, output);
}

GPU acceleration and Delegates

Some devices provide hardware acceleration for machine learning operations. For example, most mobile phones have GPUs, which can perform floating point matrix operations faster than a CPU.

The speed-up can be substantial. For example, a MobileNet v1 image classification model runs 5.5x faster on a Pixel 3 phone when GPU acceleration is used.

The TensorFlow Lite interpreter can be configured with Delegates to make use of hardware acceleration on different devices. The GPU Delegate allows the interpreter to run appropriate operations on the device's GPU.

The following code shows the GPU Delegate being used from Java:

GpuDelegate delegate = new GpuDelegate();
Interpreter.Options options = (new Interpreter.Options()).addDelegate(delegate);
Interpreter interpreter = new Interpreter(tensorflow_lite_model_file, options);
try {
  interpreter.run(input, output);
}

To add support for new hardware accelerators you can define your own delegate.

Android and iOS

The TensorFlow Lite interpreter is easy to use from both major mobile platforms. To get started, explore the Android quickstart and iOS quickstart guides. Example applications are available for both platforms.

To obtain the required libraries, Android developers should use the TensorFlow Lite AAR. iOS developers should use the CocoaPods for Swift or Objective-C.

Linux

Embedded Linux is an important platform for deploying machine learning. We provide build instructions for both Raspberry Pi and Arm64-based boards such as Odroid C2, Pine64, and NanoPi.

Microcontrollers

TensorFlow Lite for Microcontrollers is an experimental port of TensorFlow Lite aimed at microcontrollers and other devices with only kilobytes of memory.

Operations

If your model requires TensorFlow operations that are not yet implemented in TensorFlow Lite, you can use TensorFlow Select to use them in your model. You'll need to build a custom version of the interpreter that includes the TensorFlow operations.

You can use Custom operators to write your own operations, or port new operations into TensorFlow Lite.

Operator versions allows you to add new functionalities and parameters into existing operations.

4. Optimize your model

TensorFlow Lite provides tools to optimize the size and performance of your models, often with minimal impact on accuracy. Optimized models may require slightly more complex training, conversion, or integration.

Machine learning optimization is an evolving field, and TensorFlow Lite's Model Optimization Toolkit is continually growing as new techniques are developed.

Performance

The goal of model optimization is to reach the ideal balance of performance, model size, and accuracy on a given device. Performance best practices can help guide you through this process.

Quantization

By reducing the precision of values and operations within a model, quantization can reduce both the size of model and the time required for inference. For many models, there is only a minimal loss of accuracy.

The TensorFlow Lite converter makes it easy to quantize TensorFlow models. The following Python code quantizes a SavedModel and saves it to disk:

import tensorflow as tf

converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
tflite_quant_model = converter.convert()
open("converted_model.tflite", "wb").write(tflite_quantized_model)

To learn more about quantization, see Post-training quantization.

Model Optimization Toolkit

The Model Optimization Toolkit is a set of tools and techniques designed to make it easy for developers to optimize their models. Many of the techniques can be applied to all TensorFlow models and are not specific to TensorFlow Lite, but they are especially valuable when running inference on devices with limited resources.

Next steps

Now that you're familiar with TensorFlow Lite, explore some of the following resources: