TensorFlow Model Optimization

Updated: June, 2021

TensorFlow’s Model Optimization Toolkit (MOT) has been used widely for converting/optimizing TensorFlow models to TensorFlow Lite models with smaller size, better performance and acceptable accuracy to run them on mobile and IoT devices. We are now working to extend MOT techniques and tooling beyond TensorFlow Lite to support TensorFlow SavedModel as well.

The following represents a high level overview of our roadmap. You should be aware that this roadmap may change at any time and the order below does not reflect any type of priority. We strongly encourage you to comment on our roadmap and provide us feedback in the discussion group.

Quantization

TensorFlow Lite

Selective post-training quantization to exclude certain layers from quantization.
Quantization debugger to inspect quantization error losses per layer.
Applying quantization-aware training on more model coverage e.g. TensorFlow Model Garden.
Quality and performance improvements for post-training dynamic-range. quantization.

TensorFlow

Post Training Quantization (bf16 * int8 dynamic range).
Quantization Aware Training ((bf16 * int8 weight-only with fake quant).
Selective post-training quantization to exclude certain layers from quantization.
Quantization debugger to inspect quantization error losses per layer.

Sparsity

TensorFlow Lite

Sparse model execution support for more models.
Target aware authoring for Sparsity.
Extend sparse op set with performant x86 kernels.

TensorFlow

Sparity support in TensorFlow.

Cascading compression techniques

Quantization + Tensor Compression + Sparsity: demonstrate all 3 techniques working together.

Compression

Tensor compression API to help compression algorithm developers implement their own model compression algorithm (e.g. Weight Clustering) including providing a standard way to test/benchmark.