TensorFlow 模型最佳化
透過集合功能整理內容
你可以依據偏好儲存及分類內容。
TensorFlow 模型最佳化工具包可將機器學習推論最佳化的複雜度降至最低。
部署機器學習模型時,推論效率會影響到延遲狀況、記憶體使用率,也經常會影響耗電量,因此是一項非常重要的考量。尤其是在行動裝置和物聯網 (IoT) 等邊緣裝置上,資源會更加有限,因此模型大小和運算效率會成為主要考量。
訓練的運算需求會隨著在不同架構上訓練的模型數量而增長,但是推論的運算需求是根據使用者人數依比例成長。
用途
以下列舉模型最佳化的好處:
- 減少雲端和邊緣裝置 (例如行動裝置、IoT) 的推論延遲狀況和成本。
- 將模型部署至邊緣裝置,並限制處理能力、記憶體和/或耗電量。
- 減少無線模型更新的酬載大小。
- 可在限制為定點運算或最佳化定點運算的硬體上執行。
- 最佳化特殊用途的硬體加速器模型。
最佳化技術
模型最佳化的領域可能涉及多種技術:
- 透過修剪和結構化修剪來減少參數的數量。
- 透過量化來降低表示法精確度。
- 透過減少參數或是加快執行速度,將原始模型拓撲更新為更有效率的拓撲。例如:張量分解方法和蒸餾
我們的工具包支援訓練後的量化、量化感知訓練、修剪和分群法。這個工具包也提供針對協同合作式最佳化的實驗性支援,以結合多種不同的技巧。
量化
量化模型是用較低精確度表示的模型,例如使用 8 位元整數,而非 32 位元浮點數。運用特定硬體時,必須採用較低的精確度。
稀疏度與修剪
稀疏模型是指運算子之間的連線 (類神經網路層) 已經過修剪,因此將零帶入參數張量的模型。
分群法
已分群的模型是指已使用少量唯一值取代原始模型參數的模型。
協同合作式最佳化
這個工具包提供針對協同合作式最佳化的實驗性支援。您可利用這個工具包,享有結合多種模型壓縮技巧帶來的好處,並透過量化感知訓練同步提高準確率。
除非另有註明,否則本頁面中的內容是採用創用 CC 姓名標示 4.0 授權,程式碼範例則為阿帕契 2.0 授權。詳情請參閱《Google Developers 網站政策》。Java 是 Oracle 和/或其關聯企業的註冊商標。
上次更新時間:2021-09-01 (世界標準時間)。
[null,null,["上次更新時間:2021-09-01 (世界標準時間)。"],[],[],null,["# TensorFlow model optimization\n\n\u003cbr /\u003e\n\nThe *TensorFlow Model Optimization Toolkit* minimizes the complexity of\noptimizing machine learning inference.\n\nInference efficiency is a critical concern when deploying machine learning\nmodels because of latency, memory utilization, and in many cases power\nconsumption. Particularly on edge devices, such as mobile and Internet of Things\n(IoT), resources are further constrained, and model size and efficiency of\ncomputation become a major concern.\n\nComputational demand for *training* grows with the number of models trained on\ndifferent architectures, whereas the computational demand for *inference* grows\nin proportion to the number of users.\n\nUse cases\n---------\n\nModel optimization is useful, among other things, for:\n\n- Reducing latency and cost for inference for both cloud and edge devices (e.g. mobile, IoT).\n- Deploying models on edge devices with restrictions on processing, memory and/or power-consumption.\n- Reducing payload size for over-the-air model updates.\n- Enabling execution on hardware restricted-to or optimized-for fixed-point operations.\n- Optimizing models for special purpose hardware accelerators.\n\nOptimization techniques\n-----------------------\n\nThe area of model optimization can involve various techniques:\n\n- Reduce parameter count with pruning and structured pruning.\n- Reduce representational precision with quantization.\n- Update the original model topology to a more efficient one with reduced parameters or faster execution. For example, tensor decomposition methods and distillation\n\nOur toolkit supports\n[post-training quantization](./quantization/post_training),\n[quantization aware training](./quantization/training),\n[pruning](./pruning/index), and [clustering](./clustering/index). The\ntoolkit also provides experimental support for\n[collaborative optimization](./combine/collaborative_optimization) to combine\nvarious techniques.\n\n### Quantization\n\nQuantized models are those where we represent the models with lower precision,\nsuch as 8-bit integers as opposed to 32-bit float. Lower precision is a\nrequirement to leverage certain hardware.\n\n### Sparsity and pruning\n\nSparse models are those where connections in between operators (i.e. neural\nnetwork layers) have been pruned, introducing zeros to the parameter tensors.\n\n### Clustering\n\nClustered models are those where the original model's parameters are replaced\nwith a smaller number of unique values.\n\n### Collaborative optimizaiton\n\nThe toolkit provides experimental support for collaborative optimization. This\nenables you to benefit from combining several model compression techniques and\nsimultaneously achieve improved accuracy through quantization aware training."]]