摘要
随着人工智能应用的实时性、隐私性和安全性需求增大,在边缘计算平台上部署高性能的神经网络成为研究热点。由于常见的边缘计算平台在存储、算力、功耗上均存在限制,因此深度神经网络的端侧部署仍然是一个巨大的挑战。目前,克服上述挑战的一个思路是对现有的神经网络压缩以适配设备部署条件。现阶段常用的模型压缩算法有剪枝、量化、知识蒸馏,多种方法优势互补同时联合压缩可实现更好的压缩加速效果,正成为研究的热点。本文首先对常用的模型压缩算法进行简要概述,然后总结了“知识蒸馏+剪枝”、“知识蒸馏+量化”和“剪枝+量化”3种常见的联合压缩算法,重点分析论述了联合压缩的基本思想和方法,最后提出了神经网络压缩联合优化方法未来的重点发展方向。
With the increasing demand for real-time,privacy and security of AI applications,deploying high-performance neural network on an edge computing platform has become a research hotspot.Since common edge computing platforms have limitations in storage,computing power,and power consumption,the edge deployment of deep neural networks is still a huge challenge.Currently,one method to overcome the challenges is to compress the existing neural network to adapt to the device deployment conditions.The commonly used model compression algorithms include pruning,quantization,and knowledge distillation.By taking advantage of complementary multiple methods,the combined compression can achieve better compression acceleration effect,which is becoming a hot spot in research.This paper first makes a brief overview of the commonly used model compression algorithms,and then summarizes three commonly used joint compression algorithms:“knowledge distillation+pruning”,“knowledge distillation+quantification”and"pruning+quantification",focusing on the analysis and discussion of basic ideas and methods of joint compression.Finally,the future key development direction of the neural network compression joint optimization method is put forward.
作者
宁欣
赵文尧
宗易昕
张玉贵
陈灏
周琦
马骏骁
NING Xin;ZHAO Wenyao;ZONG Yixin;ZHANG Yugui;CHEN Hao;ZHOU Qi;MA Junxiao(Institute of Semiconductors,Chinese Academy of Sciences,Beijing 100083,China;School of Microelectronics,Hefei University of Technology,Hefei 230009,China;Bureau of Frontier Sciences and Education,Chinese Academy of Sciences,Beijing 100864,China;College of Artificial Intelligence,Nankai University,Tianjin 300071,China)
出处
《智能系统学报》
CSCD
北大核心
2024年第1期36-57,共22页
CAAI Transactions on Intelligent Systems
基金
国家自然科学基金项目(62373343)
北京市自然科学基金项目(L233036)。
关键词
神经网络
压缩
剪枝
量化
知识蒸馏
模型压缩
深度学习
neural network
compression
pruning
quantization
knowledge distillation
model compression
deep learning