TensorFlow 模型优化
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
TensorFlow 模型优化工具包可最大限度地降低优化机器学习推断的复杂性。
在部署机器学习模型时,推断效率是一个关键考虑因素,因为延迟时间和内存利用率会受到影响,并且在很多情况下还会影响耗电量。尤其是在边缘设备(例如移动设备和物联网 (IoT) 设备)上,由于资源受到进一步限制,因此模型大小和计算效率成为一个主要考虑因素。
随着在不同架构上训练的模型数量不断增多,训练作业的计算需求也会不断提高;而推断作业的计算需求的增长速度与用户数量成比例。
用例
模型优化具有很多优势,例如:
- 降低云设备和边缘设备(例如移动设备和 IoT 设备)的延迟时间和推断成本。
- 将模型部署到边缘设备,这些设备在处理、内存和/或耗电量方面存在限制。
- 减小无线模型更新的载荷大小。
- 在专用于定点操作的硬件或针对定点操作优化的硬件上执行模型。
- 针对专用硬件加速器优化模型。
优化技巧
模型优化领域涉及多种技术:
- 通过剪枝和结构化剪枝减少参数数量。
- 通过量化降低表示法精度。
- 将原始模型拓扑更新为更高效的拓扑,后者的参数数量更少或执行速度更快。例如,张量分解方法和蒸馏
我们的工具包支持训练后量化、量化感知训练、剪枝和聚类。该工具包还为协作优化提供实验性支持,以便结合使用不同技术。
量化
量化模型是指以更低的精度(例如 8 位整数而非 32 位浮点数)表示模型。必须降低精度,才能利用某些硬件。
稀疏性和剪枝
在稀疏模型中,各运算符(即神经网络层)之间的连接已被剪枝,并在参数张量中引入了零。
聚类
在聚类模型中,原始模型的参数被替换为较少数量的唯一值。
协作优化
该工具包为协作优化提供实验性支持。这样您可以利用结合多种模型压缩技术带来的好处,同时通过量化感知训练提高准确率。
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2021-09-01。
[null,null,["最后更新时间 (UTC):2021-09-01。"],[],[],null,["# TensorFlow model optimization\n\n\u003cbr /\u003e\n\nThe *TensorFlow Model Optimization Toolkit* minimizes the complexity of\noptimizing machine learning inference.\n\nInference efficiency is a critical concern when deploying machine learning\nmodels because of latency, memory utilization, and in many cases power\nconsumption. Particularly on edge devices, such as mobile and Internet of Things\n(IoT), resources are further constrained, and model size and efficiency of\ncomputation become a major concern.\n\nComputational demand for *training* grows with the number of models trained on\ndifferent architectures, whereas the computational demand for *inference* grows\nin proportion to the number of users.\n\nUse cases\n---------\n\nModel optimization is useful, among other things, for:\n\n- Reducing latency and cost for inference for both cloud and edge devices (e.g. mobile, IoT).\n- Deploying models on edge devices with restrictions on processing, memory and/or power-consumption.\n- Reducing payload size for over-the-air model updates.\n- Enabling execution on hardware restricted-to or optimized-for fixed-point operations.\n- Optimizing models for special purpose hardware accelerators.\n\nOptimization techniques\n-----------------------\n\nThe area of model optimization can involve various techniques:\n\n- Reduce parameter count with pruning and structured pruning.\n- Reduce representational precision with quantization.\n- Update the original model topology to a more efficient one with reduced parameters or faster execution. For example, tensor decomposition methods and distillation\n\nOur toolkit supports\n[post-training quantization](./quantization/post_training),\n[quantization aware training](./quantization/training),\n[pruning](./pruning/index), and [clustering](./clustering/index). The\ntoolkit also provides experimental support for\n[collaborative optimization](./combine/collaborative_optimization) to combine\nvarious techniques.\n\n### Quantization\n\nQuantized models are those where we represent the models with lower precision,\nsuch as 8-bit integers as opposed to 32-bit float. Lower precision is a\nrequirement to leverage certain hardware.\n\n### Sparsity and pruning\n\nSparse models are those where connections in between operators (i.e. neural\nnetwork layers) have been pruned, introducing zeros to the parameter tensors.\n\n### Clustering\n\nClustered models are those where the original model's parameters are replaced\nwith a smaller number of unique values.\n\n### Collaborative optimizaiton\n\nThe toolkit provides experimental support for collaborative optimization. This\nenables you to benefit from combining several model compression techniques and\nsimultaneously achieve improved accuracy through quantization aware training."]]