摘要
针对传统支持向量机(SVM)参数寻优算法在处理大样本数据集时存在的寻优时间过长,内存消耗过大等问题,提出了一种基于Spark通用计算引擎的并行可调SVM参数寻优算法。该算法首先使用Spark集群将训练集以广播变量的形式广播给各个Executor,然后并行化SVM的参数寻优过程,并在在寻优过程中控制Task并行度,使各个Executor负载均衡,从而加快寻优速度。实验结果表明,本文提出的参数寻优算法,通过设置合理的Task并行度,可以在充分使用集群资源的同时提高最优参数的寻找速度,减少寻优时间。
To solve the problems of the traditional support vector machine parameter optimization algorithm in dealing with large sample data sets,such as long time-consuming and excessive memory consumption,we proposed a parallel adjustable Support Vector Machine(SVM)parameter optimization algorithm based on Spark universal computing engine. Firstly,this algorithm uses Spark cluster to distribute the training set to each executor in the form of broadcast variables,and then makes the parameter optimization process of SVM parallel. In the parameter optimization process,each executor is load-balanced by controlling the parallelisms of the tasks,thereby speeding up the parameter optimization. At last the experimental results show that the proposed algorithm in this paper can improve the search speed and reduce the optimization time by setting the reasonable tasks parallelisms with making full use of the cluster resources.
作者
何经纬
刘黎志
彭贝
付星堡
HE Jingwei;LIU Lizhi;PENG Bei;FU Xingbao(Hubei Key Laboratory of Intelligent Robot (Wuhan Institute of Technology),Wuhan 430205,China;School of Computer Science & Engineering,Wuhan Institute of Technology,Wuhan 430205,China)
出处
《武汉工程大学学报》
CAS
2019年第3期283-289,共7页
Journal of Wuhan Institute of Technology
基金
武汉工程大学第十届研究生教育创新基金(CX2018215)
关键词
支持向量机
参数寻优
SPARK
并行度
负载均衡
support vector machine
parameter optimization
spark
parallelism
load balancing