有关类似内容的背景信息,请参阅论文Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference。这篇论文介绍了此工具使用的一些概念。实现并不完全相同,而且此工具中还使用了其他概念(例如按轴量化)。
[null,null,["最后更新时间 (UTC):2022-01-24。"],[],[],null,["# Quantization aware training\n\n\u003cbr /\u003e\n\n~Maintained by TensorFlow Model Optimization~\n\nThere are two forms of quantization: post-training quantization and\nquantization aware training. Start with [post-training quantization](/model_optimization/guide/quantization/post_training)\nsince it's easier to use, though quantization aware training is often better for\nmodel accuracy.\n\nThis page provides an overview on quantization aware training to help you\ndetermine how it fits with your use case.\n\n- To dive right into an end-to-end example, see the [quantization aware training example](/model_optimization/guide/quantization/training_example).\n- To quickly find the APIs you need for your use case, see the [quantization aware training comprehensive guide](/model_optimization/guide/quantization/training_comprehensive_guide).\n\nOverview\n--------\n\nQuantization aware training emulates inference-time quantization, creating a\nmodel that downstream tools will use to produce actually quantized models.\nThe quantized models use lower-precision (e.g. 8-bit instead of 32-bit float),\nleading to benefits during deployment.\n\n### Deploy with quantization\n\nQuantization brings improvements via model compression and latency reduction.\nWith the API defaults, the model size shrinks by 4x, and we typically see\nbetween 1.5 - 4x improvements in CPU latency in the tested backends. Eventually,\nlatency improvements can be seen on compatible machine learning accelerators,\nsuch as the [EdgeTPU](https://coral.ai/docs/edgetpu/benchmarks/) and NNAPI.\n\nThe technique is used in production in speech, vision, text, and translate use\ncases. The code currently supports a\n[subset of these models](#general_support_matrix).\n\n### Experiment with quantization and associated hardware\n\nUsers can configure the quantization parameters (e.g. number of bits) and to\nsome degree, the underlying algorithms. Note that with these changes from the\nAPI defaults, there is currently no supported path for deployment to a backend.\nFor instance, TFLite conversion and kernel implementations only support 8-bit\nquantization.\n\nAPIs specific to this configuration are experimental and not subject to backward\ncompatibility.\n\n### API compatibility\n\nUsers can apply quantization with the following APIs:\n\n- Model building: `keras` with only Sequential and Functional models.\n- TensorFlow versions: TF 2.x for tf-nightly.\n - [`tf.compat.v1`](https://www.tensorflow.org/api_docs/python/tf/compat/v1) with a TF 2.X package is not supported.\n- TensorFlow execution mode: eager execution\n\nIt is on our roadmap to add support in the following areas:\n\n- Model building: clarify how Subclassed Models have limited to no support\n- Distributed training: [`tf.distribute`](https://www.tensorflow.org/api_docs/python/tf/distribute)\n\n### General support matrix\n\nSupport is available in the following areas:\n\n- Model coverage: models using [allowlisted layers](https://github.com/tensorflow/model-optimization/tree/master/tensorflow_model_optimization/python/core/quantization/keras/default_8bit/default_8bit_quantize_registry.py), BatchNormalization when it follows Conv2D and DepthwiseConv2D layers, and in limited cases, `Concat`.\n- Hardware acceleration: our API defaults are compatible with acceleration on EdgeTPU, NNAPI, and TFLite backends, amongst others. See the caveat in the roadmap.\n- Deploy with quantization: only per-axis quantization for convolutional layers, not per-tensor quantization, is currently supported.\n\nIt is on our roadmap to add support in the following areas:\n\n- Model coverage: extended to include RNN/LSTMs and general Concat support.\n- Hardware acceleration: ensure the TFLite converter can produce full-integer models. See [this\n issue](https://github.com/tensorflow/tensorflow/issues/38285) for details.\n- Experiment with quantization use cases:\n - Experiment with quantization algorithms that span Keras layers or require the training step.\n - Stabilize APIs.\n\nResults\n-------\n\n### Image classification with tools\n\n| Model | Non-quantized Top-1 Accuracy | 8-bit Quantized Accuracy |\n|-----------------|------------------------------|--------------------------|\n| MobilenetV1 224 | 71.03% | 71.06% |\n| Resnet v1 50 | 76.3% | 76.1% |\n| MobilenetV2 224 | 70.77% | 70.01% |\n\nThe models were tested on Imagenet and evaluated in both TensorFlow and TFLite.\n\n### Image classification for technique\n\n| Model | Non-quantized Top-1 Accuracy | 8-Bit Quantized Accuracy |\n|---------------|------------------------------|--------------------------|\n| Nasnet-Mobile | 74% | 73% |\n| Resnet-v2 50 | 75.6% | 75% |\n\nThe models were tested on Imagenet and evaluated in both TensorFlow and TFLite.\n\nExamples\n--------\n\nIn addition to the\n[quantization aware training example](/model_optimization/guide/quantization/training_example),\nsee the following examples:\n\n- CNN model on the MNIST handwritten digit classification task with quantization: [code](https://github.com/tensorflow/model-optimization/blob/master/tensorflow_model_optimization/python/core/quantization/keras/quantize_functional_test.py)\n\nFor background on something similar, see the *Quantization and Training of\nNeural Networks for Efficient Integer-Arithmetic-Only Inference*\n[paper](https://arxiv.org/abs/1712.05877). This paper introduces some concepts\nthat this tool uses. The implementation is not exactly the same, and there are\nadditional concepts used in this tool (e.g. per-axis quantization)."]]