期刊文献+

基于Spark的LIBSVM参数优选并行化算法 被引量:21

The parallel algorithms for LIBSVM parameter optimization based on Spark
下载PDF
导出
摘要 利用Spark集群设计LIBSVM参数优选的并行化实现.LIBSVM是一款广泛使用的SVM软件包,广泛应用于模型搭建、样本训练和结果预测等方面.在用LIBSVM训练数据集时,参数的选择对训练结果影响显著,其中以参数C和g最为重要.LIBSVM软件包中采用网格搜索算法对C、g参数组合进行寻优,尽管该算法在单机上实现了并行化,但当数据量达到一定程度时,仍需要花费大量的时间.基于Spark并行计算架构,进行了LIBSVM的C、g参数网格优选并行算法的设计与实现.实验结果表明,提出的并行粗粒度网格搜索C、g参数优选算法比传统算法速度提升了近7倍,而且这一提升将随着集群规模的扩大而进一步加大.另一方面,在粗粒度网格搜索的基础上,进而提出的细粒度并行网格搜索算法又进一步提升了C、g参数组合的优选结果. The purpose of this work is to design a parallel implementation of LIBSVM parameters optimization using Spark cluster.LIBSVM is a widely-used software package,which applies in models building,samples training,results predicting,etc.When LIBSVM is used to train data set,the choice of parameters,especially the parameter Cand parameter g,has a significant impact on the training results.In LIBSVM,the grid search algorithm is chosen to finish the optimization of combination of parameter Cand parameter g,which will run for a long time when the data volumereaches a certain degree,even though it is carried out in parallel manner on a single computer.In recent years,with the development of big data,cluster parallel computing and the emergence of in-memory computing platforms,such as Apache Spark,the efficiency of parameter optimization will be expected to increase dramatically when the parameter optimization is implemented in parallel manner on computing clusters.In this paper,we design and implement the parallelized parameter optimization algorithms of LIBSVM based on Spark parallel computing architecture.Experiment results show that the speed of parallelized parameter optimization by coarse-grained grid-search algorithm,proposed in this paper,is about 7times as much as the serial one.And this improvement result will be further promoted with the expansion of the cluster scale.On the other hand,based on the coarse-grained grid-search algorithm,we achieve another improvement on the result of Cand g parameter combination optimization,after the application of fine-grained parallel grid search algorithm.
出处 《南京大学学报(自然科学版)》 CAS CSCD 北大核心 2016年第2期343-352,共10页 Journal of Nanjing University(Natural Science)
基金 国家高技术研究发展计划(863计划)(2013AA06A411) 国家自然科学基金(61471361) 中央高校基本科研业务费(2011QNB26)
关键词 LIBSVM 参数优选 网格搜索 并行化 SPARK LIBSVM parameter optimization grid search parallelize Spark
  • 相关文献

参考文献13

二级参考文献80

共引文献111

同被引文献199

引证文献21

二级引证文献138

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部