摘要
深度学习已广泛应用在科研教学、工业生产等各领域,但因其数据量庞大和模型结构复杂,在模型训练阶段要依赖大量的计算资源。为了能在实验教学环节提升资源利用效率,让学生更加熟练地掌握数据搜集和模型参数调整优化的能力,提出了一种基于权重复用的训练加速方法,分别对VGG和ResNet网络结构的深度和宽度进行伸缩拓展,允许模型复用结构相似但不需要完全一致的网络权重。实验结果表明,在CIFAR10数据集上测试,采用权重复用方法进行初始化的训练更快收敛,而且在训练结束时与随机化训练的准确率相近,实现拓展后的网络加速训练,是一种更加灵活的知识迁移方法,有助于培养学生对复杂模型的理解与优化能力。
[Objective]Deep learning has been widely applied in various fields,including scientific research,teaching,and industrial production.However,due to the large amount of data and complex model structure,it relies on a large amount of computing resources during the model training stage.The knowledge transfer method that reuses the weights of pretrained models has been widely used in the fields of computer vision and natural language processing.For example,when training a detection network on the VOC or COCO dataset,the pretrained classification network of the ImageNet dataset is used as the backbone network to perform further training.On the one hand,reusing the weights trained on similar datasets helps improve the performance of the target task.On the other hand,this can also accelerate the training process.To improve resource utilization efficiency in experimental teaching and to allow students to become more proficient in data collection and model parameter adjustment and optimization,a weight reuse-based training acceleration method is proposed.[Methods]Common weight reuse methods often require a high degree of structural consistency between the pretrained network and the target network,which limits the expansion of the network when exploring a suitable network structure.In this paper,a more flexible method of knowledge transfer is proposed,which allows the network to reuse the weights of the other whose structure is similar but not completely consistent.Training an expanded network using our method is much faster than training from scratch.The algorithm expands the depth and width of the VGG and ResNet network structures,respectively,allowing models to reuse network weights that are similar in structure but not completely consistent.The network exploration scheme of the proposed weight reuse method differs from that of the knowledge distillation-based scheme.It directly transforms the weights of the previously explored network to initialize the new network rather than training from scratch.Due to the lack of guidance from teacher networks,there will be no additional time or space expenses.[Results]In the width expansion and depth expansion experiments,the training curves initialized using the proposed weight reuse method are clearly located on the left side of the training curve for random initialization training,and their performance is quite similar or even better at the end of training.The experimental results reveal that when applied to the CIFAR10 dataset,the training using the weight reuse method for initialization converges faster,and the accuracy at the end of the training is similar to that of randomized training,achieving the goal of accelerating the training of the expanded network.It is a more flexible knowledge transfer method that helps students focus on the understanding and optimization ability of complex models.[Conclusions]The proposed knowledge transfer method for reusing network pretraining weight transfers knowledge from small networks to large networks,which can effectively accelerate the training speed and facilitate the iterative expansion of network size during the design and validation stages with strong flexibility.Training using this method for initialization converges faster and has an accuracy similar to randomized training at the end of the training.During the experiment,the need for students to search for GPU computing resources after class is solved,which reduces the waiting time for model training.This is beneficial for students to deepen their understanding of key scientific issues in instrument system design and to improve their comprehensive innovation ability.Therefore,the developed weight reuse algorithm has certain theoretical and practical value and can be used for teaching deep learning experimental courses,effectively improving resource utilization efficiency and course learning progress.
作者
应仰威
章洛铭
齐炜
郑楷
周泓
YING Yangwei;ZHANG Luoming;QI Wei;ZHENG Kai;ZHOU Hong(College of Biomedical Engineering&Instrument Science,Zhejiang University,Hangzhou 310027,China)
出处
《实验技术与管理》
CAS
北大核心
2024年第5期15-22,共8页
Experimental Technology and Management
基金
国家重点研发计划项目(2022YFC3602601)
教育部产学合作协同育人项目(220600656141412)。
关键词
卷积神经网络
知识迁移
训练加速
权重复用
convolutional neural network
knowledge transfer
training acceleration
weight reuse