摘要
为了提高机器学习算法超参数寻优效率,提出了一种基于参数并行机制的机器学习参数寻优方法。该方法通过群启发式算法来进行机器学习算法的参数寻优,将种群转换为Spark平台特有的弹性分布式数据集,针对参数寻优耗时特点并行计算种群中个体适应度。选取随机森林和遗传算法作为实验算法,设计了多组实验对所提出的学习训练方法进行验证。实验结果表明:该方法的参数寻优能力和效率都优于主流的网格搜索算法;在20万条以下的小数据量下,与基于数据并行机制的机器学习参数寻优方法相比,该方法运行时间最多能够减少69.5%,并具有良好的可扩展性。
In order to improve the super parameter optimization efficiency of machine learning algorithm,a machine learning parameter optimization method based on parameter parallel mechanism was proposed.The group heuristic algorithm was used to optimize the parameters of the machine learning algorithm,the population was transformed into the unique elastic distributed data set of Spark platform,and the individual fitness in the population was calculated in parallel according to the time-consuming characteristics of parameter optimization.Random forest and genetic algorithm were selected as experimental algorithms,and several groups of experiments were designed to verify the proposed learning and training method.The experimental results show that the parameter optimization ability and efficiency of this method are better than the mainstream grid search algorithm.Compared with the machine learning parameter optimization method based on data parallel mechanism,the running time of this method can be reduced by 69.5%at most,and has good scalability.
作者
杨艳艳
李雷孝
林浩
王永生
王慧
高静
YANG Yan-yan;LI Lei-xiao;LIN Hao;WANG Yong-sheng;WANG Hui;GAO Jing(College of Data Science and Application, Inner Mongolia University of Technology, Hohhot 010080, China;Inner Mongolia Autonomous Region Engineering & Technology Research Center of Big Data Based Software Service, Hohhot 010080, China;College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot 010010, China)
出处
《科学技术与工程》
北大核心
2022年第5期1972-1980,共9页
Science Technology and Engineering
基金
内蒙古自治区科技成果转化资金(2020CG0073)
内蒙古自治区科技重大专项(2019ZD015,2019ZD016)
内蒙古自治区关键技术攻关计划(2019GG273,2020GG0094)
内蒙古高等学校科学研究项目(NJZY21317)
内蒙古工业大学科学研究重点项目(ZZ202017)。
关键词
参数寻优
群启发式算法
SPARK
参数并行
机器学习算法
parameter optimization
swarm heuristic algorithm
Spark
parallel parameters
machine learning algorithm