摘要
知识蒸馏(knowledge distillation,KD)通过最大化近似输出分布使“教师网络”指导“学生网络”充分训练,成为大规模深度网络近端迁移、部署及应用的重要技术.然而,隐私保护意识增强与传输问题加剧使网络训练数据难以获取.如何在Data-Free的自由环境下,保证压缩网络准确率成为重要的研究方向.Data-Free学生网络学习(data-free learning of student networks,DAFL)模型,建立“教师”端生成器获得与预训练网络分布近似的伪数据集,通过知识蒸馏训练“学生网络”.然而,该框架中生成器构建及优化仍存在2个问题:1)过度信任“教师网络”对缺失真实标签伪样本的判别结果,同时,“教师网络”与“学生网络”优化目标不同,使“学生网络”难以获得准确、一致的优化信息;2)仅依赖于“教师网络”训练损失,导致数据特征多样性缺失,降低“学生网络”泛化性.针对这2个问题,提出双生成器网络架构DG-DAFL(double generators-DAFL),分别建立“教师”与“学生”端生成器并同时优化,实现网络任务与优化目标一致,提升“学生网络”判别性能.进一步,增加双生成器样本分布差异损失,利用“教师网络”潜在分布先验信息优化生成器,保证“学生网络”识别准确率并提升泛化性.实验结果表明,该方法在Data-Free环境中获得了更为有效且更鲁棒的知识蒸馏效果.DG-DAFL方法代码及模型已开源:https://github.com/LNNU-computer-research-526/DG-DAFL.git.
Knowledge distillation(KD)maximizes the similarity of output distributions between teacher-network and student-network to achieve network compression and the large-scale network proximal-end deployment and application.However,the privacy protection and transmission problems result in that the training data are difficultly collected.In the scenario of training data shortage that is called data-free,improving the performance of KD is a meaningful task.Data-free learning(DAFL)builds up teacher-generator to obtain pseudo data that are similar as real samples,and then pseudo data are utilized to train student-network by distilling.Nevertheless,the training process of teacher-generator will produce both problems:1)Absolutely trusting the discrimination outputs of teacher-network maybe include incorrectly information from unlabeled pseudo data,moreover,teacher-network and student-network have different learning targets.Therefore,it is difficult to obtain the accuracy and coincident information for training student-network.2)Over-dependences loss values originated from teacher-network,which induces pseudo data with un-diversity damaging the generalization of student-network.Aim to resolve above problems,we propose a double generators network framework DG-DAFL for data-free by building up double generators.In DG-DAFL,studentnetwork and teacher-network obtain the same learning tasks by optimizing double generators at the same time,which enhances the performance of student-network.Moreover,we construct the distribution loss between student-generator and teacher-generator to enrich sample diversity and further improve the generalization of student-network.According to the results of experiments,our method achieves the more efficient and robust performances in three popular datasets.The code and model of DG-DAFL are published in https://github.com/LNNU-computer-research-526/DGDAFL.git.
作者
张晶
鞠佳良
任永功
Zhang Jing;Ju Jialiang;Ren Yonggong(School of Computer Science and Artificial Intelligence,Liaoning Normal University,Dalian,Liaoning 116081)
出处
《计算机研究与发展》
EI
CSCD
北大核心
2023年第7期1615-1627,共13页
Journal of Computer Research and Development
基金
国家自然科学基金项目(61902165,61976109)
大连市科技创新基金项目(2018J12GX047)
教育部人文社会科学研究规划基金项目(21YJC880104)。
关键词
深度神经网络
知识蒸馏
无数据环境知识蒸馏
对抗生成网络
生成器
deep neural network
knowledge distillation
data-free knowledge distillation
generative adversarial network
generator