知识蒸馏研究综述被引量：33

Knowledge Distillation:A Survey

下载PDF

导出

摘要高性能的深度学习网络通常是计算型和参数密集型的,难以应用于资源受限的边缘设备.为了能够在低资源设备上运行深度学习模型,需要研发高效的小规模网络.知识蒸馏是获取高效小规模网络的一种新兴方法,其主要思想是将学习能力强的复杂教师模型中的“知识”迁移到简单的学生模型中.同时,它通过神经网络的互学习、自学习等优化策略和无标签、跨模态等数据资源对模型的性能增强也具有显著的效果.基于在模型压缩和模型增强上的优越特性,知识蒸馏已成为深度学习领域的一个研究热点和重点.本文从基础知识,理论方法和应用等方面对近些年知识蒸馏的研究展开全面的调查,具体包含以下内容:(1)回顾了知识蒸馏的背景知识,包括它的由来和核心思想;(2)解释知识蒸馏的作用机制;(3)归纳知识蒸馏中知识的不同形式,分为输出特征知识、中间特征知识、关系特征知识和结构特征知识;(4)详细分析和对比了知识蒸馏的各种关键方法,包括知识合并、多教师学习、教师助理、跨模态蒸馏、相互蒸馏、终身蒸馏以及自蒸馏;(5)介绍知识蒸馏与其它技术融合的相关方法,包括生成对抗网络、神经架构搜索、强化学习、图卷积、其它压缩技术、自动编码器、集成学习以及联邦学习;(6)对知识蒸馏在多个不同领域下的应用场景进行了详细的阐述;(7)讨论了知识蒸馏存在的挑战和未来的研究方向. High-performance deep learning models are usually computationally and parameter-intensive,making it hard to deploy on edge devices with limited resources.In order to run deep learning models on low resource devices,efficient small-scale networks are needed.Knowledge distillation is a new method to obtain efficient small-scale networks.Its main idea is to transfer the“knowledge”from complex teacher networks with a strong learning ability to simple student networks.In knowledge distillation,a student model improves its generalization ability by imitating the“dark knowledge”of the corresponding teacher.At the same time,it can improve the performance of models by exploiting the optimization strategies such as mutual learning and self-learning of neural networks and the data resources such as unlabeled and cross-modal.Therefore,we can obtain an efficient and effective deep learning network model through knowledge distillation.Based on these predominant characteristics in model compression and enhancement,knowledge distillation has become a research hotspot and focus in the field of deep learning.Currently,there are some surveys on knowledge distillation.However,they lack more systematic studies to present global and comprehensive views on knowledge distillation.First of all,previous investigations have ignored the application prospects of knowledge distillation in model enhancement.Second,previous surveys did not pay attention to structure knowledge,which is indispensable in the knowledge structure of a network.In model enhancement and structural feature knowledge,the application prospects of knowledge distillation have become more and more important in improving the performance of student models in the past two years.In order to overcome the shortcomings of previous works,this paper gives a description based on knowledge distillation from different perspectives and provides more detailed knowledge introductions.Specifically,we conduct a more comprehensive investigation of knowledge distillation in recent years from the aspects of basic knowledge,theoretical methods and applications,etc.It is composed of the following contents.(1)Review the background knowledge of knowledge distillation,including its origin and core ideas.(2)The working mechanism of knowledge distillation is introduced in detail,i.e.,provides the reason why knowledge distillation is effective.(3)The different knowledge forms in knowledge distillation are summarized,which are divided into response-based,feature-based,relation-based and structure knowledge.(4)Detailed analysis and comparison of various key methods in knowledge distillation,which emphasizes knowledge transfer ways,including knowledge amalgamation,learning from multiple teachers,teacher assistants,cross modal distillation,mutual distillation,lifelong distillation,and self-distillation.(5)Related methods of knowledge distillation integration with other technologies are introduced,including generative adversarial networks,neural architecture search,reinforcement learning,graph convolution,other compression techniques,autoencoders,ensemble learning,and federated learning.(6)The application scenarios of knowledge distillation in many fields are described in detail,including its application progress in model compression and enhancement.(7)The current challenges and future development trends of knowledge distillation are also discussed at the end of this paper.In a word,this paper surveys the research progress of knowledge distillation in recent years,and summarizes,compares,and analyzes the following aspects for it:origin,mechanism,knowledge forms,key methods,integration with other technologies,application progress,challenges and perspective.

作者黄震华杨顺志林威倪娟孙圣力陈运文汤庸 HUANG Zhen-Hua;YANG Shun-Zhi;LIN Wei;NI Juan;SUN Sheng-Li;CHEN Yun-Wen;TANG Yong(School of Computer Science,South China Normal University,Guangzhou 510631;School of Electronic and Information Engineering,Tongji University,Shanghai 201804;School of Philosophy and Social Development,South China Normal University,Guangzhou 510631;School of Software&Microelectronics,Peking University,Beijing 102600;Research and Development Department,DataGrand Inc,Shenzhen,Guangdong 518063)

机构地区华南师范大学计算机学院同济大学电子与信息工程学院华南师范大学哲学与社会发展学院北京大学软件与微电子学院达而观智能(深圳)有限公司研发部

出处《计算机学报》 EI CAS CSCD 北大核心 2022年第3期624-653,共30页 Chinese Journal of Computers

基金国家自然科学基金(61772366,U1811263,61972328) 上海市自然科学基金(17ZR1445900) 广东省科技计划项目(2019B090905005)资助

关键词知识蒸馏模型压缩模型增强知识迁移深度学习 knowledge distillation model compression model enhancement knowledge transfer deep learning

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献1

1黄震华,张佳雯,田春岐,孙圣力,向阳.基于排序学习的推荐算法研究综述[J].软件学报,2016,27(3):691-713. 被引量：106

二级参考文献6

1彭泽环,孙乐,韩先培,石贝.基于排序学习的微博用户推荐[J].中文信息学报,2013,27(4):96-102. 被引量：15
2陈珂,邹权,彭志平,柯文德.异质社交网络中协同排序的好友推荐算法[J].小型微型计算机系统,2014,35(6):1270-1274. 被引量：7
3丁宇新,燕泽权,冯威,薛成龙,周迪.基于有监督主题模型的排序学习算法[J].电子学报,2015,43(2):333-337. 被引量：4
4黄伟,曾舒如.基于图像信息和排序学习技术的疾病预测方法[J].南昌工程学院学报,2015,34(3):33-37. 被引量：2
5应毅,刘亚军,陈诚.基于云计算技术的个性化推荐系统[J].计算机工程与应用,2015,51(13):111-117. 被引量：24
6李金忠,杨威,夏洁武,曾小荟,孙凌宇.基于Hooke & Jeeves模式搜索的排序学习方法[J].计算机工程,2015,41(7):215-218. 被引量：3

共引文献105

1赖奕安,张玉洁,杜雨露,孟祥武.一种基于协同上下文关系学习的同城活动推荐算法[J].软件学报,2020,31(2):421-438. 被引量：5
2黄贤英,阳安志,刘小洋,刘广峰.融合兴趣的微博用户相似度计算研究[J].计算机应用研究,2020,37(1):66-70. 被引量：1
3仲秋雁,李晨,崔少泽.考虑工人参与意愿影响因素的竞争式众包任务推荐方法[J].系统工程理论与实践,2018,38(11):2954-2965. 被引量：7
4雷武,廖闻剑,彭艳兵.基于随机森林与LambdaMART的搜索排序模型[J].计算机与现代化,2017(3):54-58. 被引量：5
5谭侃,高旻,李文涛,田仁丽,文俊浩,熊庆宇.基于双层采样主动学习的社交网络虚假用户检测方法[J].自动化学报,2017,43(3):448-461. 被引量：13
6林原,徐博,孙晓玲,林鸿飞,许侃.基于似然损失函数的组样本排序学习方法[J].模式识别与人工智能,2017,30(3):235-241. 被引量：1
7王鹏举,谢晓尧,徐洋,朱东.基于向量空间模型的个性化搜索算法研究[J].贵州师范大学学报（自然科学版）,2017,35(2):88-92.
8陈嘉颖,于炯,杨兴耀,国冰磊.考虑用户间消极相似性的排序推荐算法[J].计算机工程与设计,2017,38(5):1247-1251. 被引量：1
9李超逸,张仰森,佟玲玲.一种基于社区发现的微博个性化推荐算法[J].微电子学与计算机,2017,34(6):40-44. 被引量：3
10高晓波,方献梅.融合文化和时间的学习资源推荐研究[J].软件导刊,2017,16(6):63-65.

同被引文献304

1孙祁.规范生成式人工智能产品提供者的法律问题研究[J].政治与法律,2023(7):162-176. 被引量：19
2刘艳红.企业合规不起诉改革的刑法教义学根基[J].中国刑事法杂志,2022(1):107-123. 被引量：197
3蔡莉,王淑婷,刘俊晖,朱扬勇.数据标注研究综述[J].软件学报,2020,31(2):302-320. 被引量：56
4黄海松,陈星燃,韩正功,范青松,朱云伟,胡鹏飞.基于多尺度注意力机制和知识蒸馏的茶叶嫩芽分级方法[J].农业机械学报,2022,53(9):399-407. 被引量：7
5叶中华,赵明霞,贾璐.复杂背景农作物病害图像识别研究[J].农业机械学报,2021,52(S01):118-124. 被引量：15
6刘艳红.理念、逻辑与路径:网络暴力法治化治理研究[J].江淮论坛,2022(6):21-30. 被引量：30
7刘志颖,缪希仁,陈静,江灏.电力架空线路巡检可见光图像智能处理研究综述[J].电网技术,2020,44(3):1057-1069. 被引量：88
8马鹏,樊艳芳.基于深度迁移学习的小样本智能变电站电力设备部件检测[J].电网技术,2020,44(3):1148-1159. 被引量：80
9朱军.“数字鸿沟”背景下老年人数字化生活权的法理证成[J].东南法学,2022(1):36-55. 被引量：11
10冯恺,杨润宇.人脸识别信息处理中“合法、正当、必要”原则的区分审查[J].东南法学,2022(1):18-35. 被引量：3

引证文献33

1刘艳红.生成式人工智能的三大安全风险及法律规制——以ChatGPT为例[J].东方法学,2023(4):29-43. 被引量：106
2阮杰,蒋畅,朱静洁,戴玲娜,李荣生,高飞,李鹏.轻量型简约人脸线条画生成方法[J].西安工程大学学报,2022,36(5):45-52.
3杨英仪.基于知识蒸馏的变电目标检测模型压缩及集成应用[J].信息与电脑,2022,34(13):50-53.
4徐欢,王尧,萧展辉,沈宇红.基于知识迁移和蒸馏的轻量级领域信息表示模型研究[J].电力大数据,2022,25(9):37-44.
5陈立玮,周新志.基于特征自注意力的图像分类知识蒸馏算法[J].现代计算机,2023,29(4):49-53. 被引量：1
6欧阳勐涔,张应龙,夏学文,徐星.保留低阶和高阶关系的图表示深度学习集成算法[J].计算机应用研究,2023,40(4):1130-1136. 被引量：1
7温钊发,蒲智,程曦,赵昀杰.基于知识蒸馏与EssNet的田间农作物病害识别[J].山东农业科学,2023,55(5):154-163. 被引量：2
8段军明,杨祥,董明刚.基于模型压缩对番茄病害识别的应用研究[J].北方园艺,2023(10):138-144. 被引量：1
9曹坪,杨怀志,薄一军,尤嘉,张淳杰,李丹勇.面向低质量裂缝图像的多知识蒸馏分类[J].计算机工程,2023,49(7):204-213.
10张嘉杰,过弋,王家辉.基于特征和图结构信息增强的多教师学习图神经网络[J].计算机应用研究,2023,40(7):2013-2018. 被引量：1

二级引证文献127

1王禄生.ChatGPT类技术:法律人工智能的改进者还是颠覆者?[J].政法论坛,2023,41(4):49-62. 被引量：20
2支振锋.生成式人工智能大模型的信息内容治理[J].政法论坛,2023,41(4):34-48. 被引量：55
3王迁.再论人工智能生成的内容在著作权法中的定性[J].政法论坛,2023,41(4):16-33. 被引量：62
4吕垚瑶.ChatGPT类生成式人工智能技术迭代的犯罪风险及其治理[J].中国刑警学院学报,2024(1):78-87.
5鞠雪楠(译),欧阳日辉.新一代人工智能领域数据要素定价的困境与出路[J].价格理论与实践,2023(4):28-32. 被引量：1
6侯跃伟.生成式人工智能的刑事风险与前瞻治理[J].河北法学,2024,42(2):160-178. 被引量：1
7程迈.数字时代的政府治理:焦虑、机遇与挑战[J].东南法学,2023(1):17-30.
8刘军.虚实共生:基于安全可信的元宇宙预防性治理[J].东方法学,2023(6):85-96. 被引量：3
9丁文杰.通用人工智能视野下著作权法的逻辑回归——从“工具论”到“贡献论”[J].东方法学,2023(5):94-105. 被引量：18
10李彤.通用人工智能技术提供者义务的审视与优化[J].东方法学,2023(5):70-81. 被引量：5

1韦玉梅.微课在初中英语教学中的应用[J].学周刊,2022(4):24-25. 被引量：1
2邱鑫源,叶泽聪,崔翛龙,高志强.联邦学习通信开销研究综述[J].计算机应用,2022,42(2):333-342. 被引量：6
3郭策,高跃清,沈宇婷,杜楚,陈路路,赵会盼.基于知识学习的多目标关联检测与识别方法[J].计算机测量与控制,2021,29(11):201-206. 被引量：1
4郭松,范存群.基于重要度理论的图像识别方法[J].计算机集成制造系统,2021,27(9):2736-2740. 被引量：5
5余智英.融合辨析梳理拓展——小学数学主题式作业设计的主要类型[J].试题与研究,2021(32):187-188. 被引量：1
6吴卓.基于堆叠稀疏降噪自动编码器的地区风电场群高精度超短期风电功率预测[J].电工材料,2022(1):72-75.
7潘瑞东,孔维健,齐洁.基于预训练模型与知识蒸馏的法律判决预测算法[J].控制与决策,2022,37(1):67-76. 被引量：9
8杨铮,贺骁武,吴家行,王需,赵毅.面向实时视频流分析的边缘计算技术[J].中国科学：信息科学,2022,52(1):1-53. 被引量：15
9朱明伟.网络智能化中的AI工程化技术方案[J].电信科学,2022,38(2):157-165. 被引量：5
10陈兆文,张婷.基于机器视觉的热轧型钢精整区物料跟踪[J].信息技术与信息化,2022(1):75-78. 被引量：1

计算机学报

2022年第3期

浏览历史

内容加载中请稍等...

知识蒸馏研究综述被引量：33

参考文献1

二级参考文献6

共引文献105

同被引文献304

引证文献33

二级引证文献127

相关作者

相关机构

相关主题

浏览历史

知识蒸馏研究综述 被引量：33

参考文献1

二级参考文献6

共引文献105

同被引文献304

引证文献33

二级引证文献127

相关作者

相关机构

相关主题

浏览历史

知识蒸馏研究综述被引量：33