基于权重复用的训练加速算法

Training acceleration algorithm based on weight reuse

下载PDF

导出

摘要深度学习已广泛应用在科研教学、工业生产等各领域,但因其数据量庞大和模型结构复杂,在模型训练阶段要依赖大量的计算资源。为了能在实验教学环节提升资源利用效率,让学生更加熟练地掌握数据搜集和模型参数调整优化的能力,提出了一种基于权重复用的训练加速方法,分别对VGG和ResNet网络结构的深度和宽度进行伸缩拓展,允许模型复用结构相似但不需要完全一致的网络权重。实验结果表明,在CIFAR10数据集上测试,采用权重复用方法进行初始化的训练更快收敛,而且在训练结束时与随机化训练的准确率相近,实现拓展后的网络加速训练,是一种更加灵活的知识迁移方法,有助于培养学生对复杂模型的理解与优化能力。 [Objective]Deep learning has been widely applied in various fields,including scientific research,teaching,and industrial production.However,due to the large amount of data and complex model structure,it relies on a large amount of computing resources during the model training stage.The knowledge transfer method that reuses the weights of pretrained models has been widely used in the fields of computer vision and natural language processing.For example,when training a detection network on the VOC or COCO dataset,the pretrained classification network of the ImageNet dataset is used as the backbone network to perform further training.On the one hand,reusing the weights trained on similar datasets helps improve the performance of the target task.On the other hand,this can also accelerate the training process.To improve resource utilization efficiency in experimental teaching and to allow students to become more proficient in data collection and model parameter adjustment and optimization,a weight reuse-based training acceleration method is proposed.[Methods]Common weight reuse methods often require a high degree of structural consistency between the pretrained network and the target network,which limits the expansion of the network when exploring a suitable network structure.In this paper,a more flexible method of knowledge transfer is proposed,which allows the network to reuse the weights of the other whose structure is similar but not completely consistent.Training an expanded network using our method is much faster than training from scratch.The algorithm expands the depth and width of the VGG and ResNet network structures,respectively,allowing models to reuse network weights that are similar in structure but not completely consistent.The network exploration scheme of the proposed weight reuse method differs from that of the knowledge distillation-based scheme.It directly transforms the weights of the previously explored network to initialize the new network rather than training from scratch.Due to the lack of guidance from teacher networks,there will be no additional time or space expenses.[Results]In the width expansion and depth expansion experiments,the training curves initialized using the proposed weight reuse method are clearly located on the left side of the training curve for random initialization training,and their performance is quite similar or even better at the end of training.The experimental results reveal that when applied to the CIFAR10 dataset,the training using the weight reuse method for initialization converges faster,and the accuracy at the end of the training is similar to that of randomized training,achieving the goal of accelerating the training of the expanded network.It is a more flexible knowledge transfer method that helps students focus on the understanding and optimization ability of complex models.[Conclusions]The proposed knowledge transfer method for reusing network pretraining weight transfers knowledge from small networks to large networks,which can effectively accelerate the training speed and facilitate the iterative expansion of network size during the design and validation stages with strong flexibility.Training using this method for initialization converges faster and has an accuracy similar to randomized training at the end of the training.During the experiment,the need for students to search for GPU computing resources after class is solved,which reduces the waiting time for model training.This is beneficial for students to deepen their understanding of key scientific issues in instrument system design and to improve their comprehensive innovation ability.Therefore,the developed weight reuse algorithm has certain theoretical and practical value and can be used for teaching deep learning experimental courses,effectively improving resource utilization efficiency and course learning progress.

作者应仰威章洛铭齐炜郑楷周泓 YING Yangwei;ZHANG Luoming;QI Wei;ZHENG Kai;ZHOU Hong(College of Biomedical Engineering&Instrument Science,Zhejiang University,Hangzhou 310027,China)

机构地区浙江大学生物医学工程与仪器科学学院

出处《实验技术与管理》 CAS 北大核心 2024年第5期15-22,共8页 Experimental Technology and Management

基金国家重点研发计划项目(2022YFC3602601) 教育部产学合作协同育人项目(220600656141412)。

关键词卷积神经网络知识迁移训练加速权重复用 convolutional neural network knowledge transfer training acceleration weight reuse

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1卓力,孙亮亮,张辉,李晓光,张菁.有噪声标注情况下的中医舌色分类方法[J].电子与信息学报,2022,44(1):89-98. 被引量：10

二级参考文献5

1王永刚,杨杰,周越,王忆勤.中医舌象颜色识别的研究[J].生物医学工程学杂志,2005,22(6):1116-1120. 被引量：31
2王爱民,赵忠旭,沈兰荪.中医舌象自动分析中舌色、苔色分类方法的研究[J].北京生物医学工程,2000,19(3):136-141. 被引量：65
3张艺凡,胡广芹,张新峰.基于支持向量机的痤疮患者舌色苔色识别算法研究[J].北京生物医学工程,2016,35(1):7-11. 被引量：8
4张静,张新峰,王亚真,蔡轶珩,胡广芹.多标记学习在中医舌象分类中的研究[J].北京生物医学工程,2016,35(2):111-116. 被引量：7
5王奕然,张新峰.基于AdaBoost级联框架的舌色分类[J].北京生物医学工程,2020,39(1):8-14. 被引量：5

共引文献9

1卓力,李艳萍,张辉,李晓光,杨洋,魏玮.基于区域注意力机制的有噪样本下中医舌色分类算法研究[J].世界科学技术-中医药现代化,2023,25(8):2873-2882.
2吴欣,徐红,林卓胜,李胜可,刘慧琳,冯跃.深度学习在舌象分类中的研究综述[J].计算机科学与探索,2023,17(2):303-323. 被引量：10
3杜元,赵国桢,叶浩然,郭玉红.中医药领域人工智能应用研究文献可视化分析[J].中国中医药信息杂志,2023,30(5):44-49. 被引量：6
4卓力,张雷,贾童瑶,李晓光,张辉.基于双阶段元学习的小样本中医舌色域自适应分类方法[J].电子与信息学报,2024,46(3):986-994. 被引量：1
5江涛,屠立平,许家佗.中医舌象智能诊断技术研究述评及展望[J].中国中医药信息杂志,2024,31(7):182-187. 被引量：1
6孙亮亮,李艳萍,张辉,卓力.基于噪声样本渐近修正的中医舌色分类方法[J].电子学报,2024,52(5):1450-1459.
7陈恩纳,李圣烨,胡毓亲,和倩,包子怡,杨华元.中医舌诊图像采集与色彩信息分析研究进展[J].中华中医药杂志,2024,39(7):3586-3589.
8卓力,李艳萍,孙亮亮,张辉,李晓光,张菁,杨洋,魏玮.基于持续学习的中医舌色苔色协同分类方法[J].北京工业大学学报,2024,50(9):1077-1088.
9宁宏宇,张魁星,薛丹,江梅.融合多注意力的舌象证候分类[J].计算机系统应用,2024,33(10):228-235.

1赵美雄.浅谈分层教学法在初中物理教学中的运用[J].中文科技期刊数据库（引文版）教育科学,2019(4):83-83.
2马联文.更新观念,正视高中化学实验教学环节[J].中文科技期刊数据库（引文版）教育科学,2019(1):227-227.
3朱宇航,史云鹏.求取二阶矩阵特征向量的一种方法[J].大众科学,2024,45(5):73-75.
4瓯海博士宣讲团创新提质[J].温州人,2024(2):35-35.
5闫喜红,张宁,段永红.一种求解矩阵填充问题的原始对偶加速算法[J].高等学校计算数学学报,2024,46(2):121-136.
6卢亚秋.核心素养视域下高中物理实验教学设计分析[J].科学咨询,2024(8):261-264.
7郑锋,刘棋伟.基于Plant Simulation的生产车间仿真优化研究与应用[J].现代工业经济和信息化,2024,14(3):139-141. 被引量：1
8周友辉,杨海.高精度三维激光雷达自我定位算法研究[J].中文科技期刊数据库（全文版）工程技术,2024(6):0052-0055.
9朱肖城,郑世慧,杨春丽.基于降噪自编码器的侧信道攻击预处理方法[J].密码学报（中英文）,2024,11(2):416-426.
10吴瑜,张弘杨,况勇,温然,华笋.新时代高校期刊社“党建+业务”融合的实践与探索[J].学会,2024(4):23-25.

实验技术与管理

2024年第5期

浏览历史

内容加载中请稍等...

基于权重复用的训练加速算法

参考文献1

二级参考文献5

共引文献9

相关作者

相关机构

相关主题

浏览历史