期刊文献+

基于双生成器网络的Data-Free知识蒸馏 被引量:1

Double-Generators Network for Data-Free Knowledge Distillation
下载PDF
导出
摘要 知识蒸馏(knowledge distillation,KD)通过最大化近似输出分布使“教师网络”指导“学生网络”充分训练,成为大规模深度网络近端迁移、部署及应用的重要技术.然而,隐私保护意识增强与传输问题加剧使网络训练数据难以获取.如何在Data-Free的自由环境下,保证压缩网络准确率成为重要的研究方向.Data-Free学生网络学习(data-free learning of student networks,DAFL)模型,建立“教师”端生成器获得与预训练网络分布近似的伪数据集,通过知识蒸馏训练“学生网络”.然而,该框架中生成器构建及优化仍存在2个问题:1)过度信任“教师网络”对缺失真实标签伪样本的判别结果,同时,“教师网络”与“学生网络”优化目标不同,使“学生网络”难以获得准确、一致的优化信息;2)仅依赖于“教师网络”训练损失,导致数据特征多样性缺失,降低“学生网络”泛化性.针对这2个问题,提出双生成器网络架构DG-DAFL(double generators-DAFL),分别建立“教师”与“学生”端生成器并同时优化,实现网络任务与优化目标一致,提升“学生网络”判别性能.进一步,增加双生成器样本分布差异损失,利用“教师网络”潜在分布先验信息优化生成器,保证“学生网络”识别准确率并提升泛化性.实验结果表明,该方法在Data-Free环境中获得了更为有效且更鲁棒的知识蒸馏效果.DG-DAFL方法代码及模型已开源:https://github.com/LNNU-computer-research-526/DG-DAFL.git. Knowledge distillation(KD)maximizes the similarity of output distributions between teacher-network and student-network to achieve network compression and the large-scale network proximal-end deployment and application.However,the privacy protection and transmission problems result in that the training data are difficultly collected.In the scenario of training data shortage that is called data-free,improving the performance of KD is a meaningful task.Data-free learning(DAFL)builds up teacher-generator to obtain pseudo data that are similar as real samples,and then pseudo data are utilized to train student-network by distilling.Nevertheless,the training process of teacher-generator will produce both problems:1)Absolutely trusting the discrimination outputs of teacher-network maybe include incorrectly information from unlabeled pseudo data,moreover,teacher-network and student-network have different learning targets.Therefore,it is difficult to obtain the accuracy and coincident information for training student-network.2)Over-dependences loss values originated from teacher-network,which induces pseudo data with un-diversity damaging the generalization of student-network.Aim to resolve above problems,we propose a double generators network framework DG-DAFL for data-free by building up double generators.In DG-DAFL,studentnetwork and teacher-network obtain the same learning tasks by optimizing double generators at the same time,which enhances the performance of student-network.Moreover,we construct the distribution loss between student-generator and teacher-generator to enrich sample diversity and further improve the generalization of student-network.According to the results of experiments,our method achieves the more efficient and robust performances in three popular datasets.The code and model of DG-DAFL are published in https://github.com/LNNU-computer-research-526/DGDAFL.git.
作者 张晶 鞠佳良 任永功 Zhang Jing;Ju Jialiang;Ren Yonggong(School of Computer Science and Artificial Intelligence,Liaoning Normal University,Dalian,Liaoning 116081)
出处 《计算机研究与发展》 EI CSCD 北大核心 2023年第7期1615-1627,共13页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61902165,61976109) 大连市科技创新基金项目(2018J12GX047) 教育部人文社会科学研究规划基金项目(21YJC880104)。
关键词 深度神经网络 知识蒸馏 无数据环境知识蒸馏 对抗生成网络 生成器 deep neural network knowledge distillation data-free knowledge distillation generative adversarial network generator
  • 相关文献

同被引文献6

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部