期刊文献+

基于自引导进化策略的高效自动化数据增强算法 被引量:1

Efficient Automated Data Augmentation Algorithm Based on Self-guided Evolution Strategy
下载PDF
导出
摘要 深度学习在图像、文本、语音等媒体数据的分析任务上取得了优异的性能.数据增强可以非常有效地提升训练数据的规模以及多样性,从而提高模型的泛化性.但是,对于给定数据集,设计优异的数据增强策略大量依赖专家经验和领域知识,而且需要反复尝试,费时费力.近年来,自动化数据增强通过机器自动设计数据增强策略,已引起了学界和业界的广泛关注.为了解决现有自动化数据增强算法尚无法在预测准确率和搜索效率之间取得良好平衡的问题,提出一种基于自引导进化策略的自动化数据增强算法SGES AA.首先,设计一种有效的数据增强策略连续化向量表示方法,并将自动化数据增强问题转换为连续化策略向量的搜索问题.其次,提出一种基于自引导进化策略的策略向量搜索方法,通过引入历史估计梯度信息指导探索点的采样与更新,在能够有效避免陷入局部最优解的同时,可提升搜索过程的收敛速度.在图像、文本以及语音数据集上的大量实验结果表明,所提算法在不显著增加搜索耗时的情况下,预测准确率优于或者匹配目前最优的自动化数据增强方法. Deep learning has achieved great success in image classification,natural language processing,and speech recognition.Data augmentation can effectively increase the scale and diversity of training data,thereby improving the generalization of deep learning models.However,for a given dataset,a well-designed data augmentation strategy relies heavily on expert experience and domain knowledge and requires repeated attempts,which is time-consuming and labor-intensive.In recent years,automated data augmentation has attracted widespread attention from the academic community and the industry through the automated design of data augmentation strategies.To solve the problem that existing automated data augmentation algorithms cannot strike a good balance between prediction accuracy and search efficiency,this study proposes an efficient automated data augmentation algorithm SGES AA based on a self-guided evolution strategy.First,an effective continuous vector representation method is designed for the data augmentation strategy,and then the automated data augmentation problem is converted into a search problem of continuous strategy vectors.Second,a strategy vector search method based on the self-guided evolution strategy is presented.By introducing historical estimation gradient information to guide the sampling and updating of exploration points,it can effectively avoid the local optimal solution while improving the convergence of the search process.The results of extensive experiments on image,text,and speech datasets show that the proposed algorithm is superior to or matches the current optimal automated data augmentation methods without significantly increasing the time consumption of searches.
作者 朱光辉 陈文忠 朱振南 袁春风 黄宜华 ZHU Guang-Hui;CHEN Wen-Zhong;ZHU Zhen-Nan;YUAN Chun-Feng;HUANG Yi-Hua(State Key Laboratory for Novel Software Technology(Nanjing University),Nanjing 210023,China)
出处 《软件学报》 EI CSCD 北大核心 2024年第6期3013-3035,共23页 Journal of Software
基金 国家自然科学基金(62102177,U1811461) 江苏省自然科学基金(BK20210181) 江苏省重点研发计划(BE2021729)。
关键词 深度学习 数据增强 自动化机器学习 自引导进化策略 deep learning data augmentation automated machine learning self-guided evolution strategy
  • 相关文献

同被引文献5

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部