期刊文献+

知识蒸馏研究综述 被引量:33

Knowledge Distillation:A Survey
下载PDF
导出
摘要 高性能的深度学习网络通常是计算型和参数密集型的,难以应用于资源受限的边缘设备.为了能够在低资源设备上运行深度学习模型,需要研发高效的小规模网络.知识蒸馏是获取高效小规模网络的一种新兴方法,其主要思想是将学习能力强的复杂教师模型中的“知识”迁移到简单的学生模型中.同时,它通过神经网络的互学习、自学习等优化策略和无标签、跨模态等数据资源对模型的性能增强也具有显著的效果.基于在模型压缩和模型增强上的优越特性,知识蒸馏已成为深度学习领域的一个研究热点和重点.本文从基础知识,理论方法和应用等方面对近些年知识蒸馏的研究展开全面的调查,具体包含以下内容:(1)回顾了知识蒸馏的背景知识,包括它的由来和核心思想;(2)解释知识蒸馏的作用机制;(3)归纳知识蒸馏中知识的不同形式,分为输出特征知识、中间特征知识、关系特征知识和结构特征知识;(4)详细分析和对比了知识蒸馏的各种关键方法,包括知识合并、多教师学习、教师助理、跨模态蒸馏、相互蒸馏、终身蒸馏以及自蒸馏;(5)介绍知识蒸馏与其它技术融合的相关方法,包括生成对抗网络、神经架构搜索、强化学习、图卷积、其它压缩技术、自动编码器、集成学习以及联邦学习;(6)对知识蒸馏在多个不同领域下的应用场景进行了详细的阐述;(7)讨论了知识蒸馏存在的挑战和未来的研究方向. High-performance deep learning models are usually computationally and parameter-intensive,making it hard to deploy on edge devices with limited resources.In order to run deep learning models on low resource devices,efficient small-scale networks are needed.Knowledge distillation is a new method to obtain efficient small-scale networks.Its main idea is to transfer the“knowledge”from complex teacher networks with a strong learning ability to simple student networks.In knowledge distillation,a student model improves its generalization ability by imitating the“dark knowledge”of the corresponding teacher.At the same time,it can improve the performance of models by exploiting the optimization strategies such as mutual learning and self-learning of neural networks and the data resources such as unlabeled and cross-modal.Therefore,we can obtain an efficient and effective deep learning network model through knowledge distillation.Based on these predominant characteristics in model compression and enhancement,knowledge distillation has become a research hotspot and focus in the field of deep learning.Currently,there are some surveys on knowledge distillation.However,they lack more systematic studies to present global and comprehensive views on knowledge distillation.First of all,previous investigations have ignored the application prospects of knowledge distillation in model enhancement.Second,previous surveys did not pay attention to structure knowledge,which is indispensable in the knowledge structure of a network.In model enhancement and structural feature knowledge,the application prospects of knowledge distillation have become more and more important in improving the performance of student models in the past two years.In order to overcome the shortcomings of previous works,this paper gives a description based on knowledge distillation from different perspectives and provides more detailed knowledge introductions.Specifically,we conduct a more comprehensive investigation of knowledge distillation in recent years from the aspects of basic knowledge,theoretical methods and applications,etc.It is composed of the following contents.(1)Review the background knowledge of knowledge distillation,including its origin and core ideas.(2)The working mechanism of knowledge distillation is introduced in detail,i.e.,provides the reason why knowledge distillation is effective.(3)The different knowledge forms in knowledge distillation are summarized,which are divided into response-based,feature-based,relation-based and structure knowledge.(4)Detailed analysis and comparison of various key methods in knowledge distillation,which emphasizes knowledge transfer ways,including knowledge amalgamation,learning from multiple teachers,teacher assistants,cross modal distillation,mutual distillation,lifelong distillation,and self-distillation.(5)Related methods of knowledge distillation integration with other technologies are introduced,including generative adversarial networks,neural architecture search,reinforcement learning,graph convolution,other compression techniques,autoencoders,ensemble learning,and federated learning.(6)The application scenarios of knowledge distillation in many fields are described in detail,including its application progress in model compression and enhancement.(7)The current challenges and future development trends of knowledge distillation are also discussed at the end of this paper.In a word,this paper surveys the research progress of knowledge distillation in recent years,and summarizes,compares,and analyzes the following aspects for it:origin,mechanism,knowledge forms,key methods,integration with other technologies,application progress,challenges and perspective.
作者 黄震华 杨顺志 林威 倪娟 孙圣力 陈运文 汤庸 HUANG Zhen-Hua;YANG Shun-Zhi;LIN Wei;NI Juan;SUN Sheng-Li;CHEN Yun-Wen;TANG Yong(School of Computer Science,South China Normal University,Guangzhou 510631;School of Electronic and Information Engineering,Tongji University,Shanghai 201804;School of Philosophy and Social Development,South China Normal University,Guangzhou 510631;School of Software&Microelectronics,Peking University,Beijing 102600;Research and Development Department,DataGrand Inc,Shenzhen,Guangdong 518063)
出处 《计算机学报》 EI CAS CSCD 北大核心 2022年第3期624-653,共30页 Chinese Journal of Computers
基金 国家自然科学基金(61772366,U1811263,61972328) 上海市自然科学基金(17ZR1445900) 广东省科技计划项目(2019B090905005)资助
关键词 知识蒸馏 模型压缩 模型增强 知识迁移 深度学习 knowledge distillation model compression model enhancement knowledge transfer deep learning
  • 相关文献

参考文献1

二级参考文献6

共引文献105

同被引文献304

引证文献33

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部