期刊文献+

基于Spark的并行SVM算法研究 被引量:17

Research on Parallel SVM Algorithm Based on Spark
下载PDF
导出
摘要 随着数据规模的不断增加,支持向量机(SVM)的并行化设计成为数据挖掘领域的一个研究热点。针对SVM算法训练大规模数据时存在寻优速度慢、内存占用大等问题,提出了一种基于Spark平台的并行支持向量机算法(SP-SVM)。该方法通过调整层叠支持向量机(Cascade SVM)的合并策略和训练结构,并利用Spark分布式计算框架实现;其次,进一步分析并行操作算子的性能,优化算法并行化实现方案,有效克服了层叠模型训练效率低的缺点。实验结果表明,新的并行训练方法在损失较小精度的前提下,在一定程度上减少了训练时间,能够很好地提高模型的学习效率。 With the constant increasing of data scale,the parallel design of support vector machine(SVM)has become a hot research topic in data mining field.In view of the problems in model training including slow optimization and large memory,we proposed a new parallel SVM algorithm(SP-SVM)based on Spark.First of all,this paper implemented algorithm using Spark parallel computing framework.Secondly,this paper analyzed the performance of the parallel operator and optimized the algorithm in parallel design scheme,solving the problem of low efficiency that cascade training model encounters.Experimental results show that the new parallel training method can save more training time and greatly improve the efficiency in the case of a small precision loss.
出处 《计算机科学》 CSCD 北大核心 2016年第5期238-242,共5页 Computer Science
基金 国家自然科学基金项目(61473149)资助
关键词 并行计算 支持向量机 大规模数据 层叠模型 SPARK Parallel computing Support vector machine Large scale data Cascade model Spark
  • 相关文献

参考文献14

  • 1Vapnik V N.The Nature of Statistical Learning Theory[M].Springer New York,1995:988-999.
  • 2Chang C C,Lin C J.LIBSVM:a Library for Support Vector Machines[J].ACM Transactions on Intelligent Systems & Technology,2006,2(3):389-396.
  • 3Dong J X,Krzyzak A,Suen C Y.Fast SVM training algorithm with decomposition on very large data sets[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2005,27(4):603-618.
  • 4Lin C Y,Tsai C H,Lee C P,et al.Large-scale logistic regression and linear support vector machines using spark[C]∥2014 IEEE International Conference on Big Data.IEEE,2014:519-528.
  • 5张巍,张功萱,王永利,张永平,朱昭萌.基于CUDA的SVM算法并行化研究[J].计算机科学,2013,40(4):69-72. 被引量:6
  • 6Sun Zhan-quan,Fox G.Study on Parallel SVM Based on MapReduce[C]∥The 2012 International Conference on Parallel and Distributed Processing Techniques and Applications.Las Vegas NV USA,2012.
  • 7Dean J,Ghemawat S.MapReduce:Simplified Data Processing on Large Clusters[J].Proceedings of Operating Systems Design and Implementation(OSDI),2004,51(1):107-113.
  • 8张鹏翔,刘利民,马志强.基于MapReduce的层叠分组并行SVM算法研究[J].计算机应用与软件,2015,32(3):172-176. 被引量:10
  • 9Zaharia M,Chowdhury M,Franklin M J,et al.Spark:clustercomputing with working sets[C]∥Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing USENIX Association.2010:10.
  • 10http://spark.apache.org.

二级参考文献18

  • 1张宝昌,陈熙霖,山世光,高文.基于支持向量的Kernel判别分析[J].计算机学报,2006,29(12):2143-2150. 被引量:10
  • 2Atzori L, Iera A, Morabito G. The Internet of Things: A survey [J]. Computer Networks, 2010,54 : 2787-2805.
  • 3Chang GC, Lin C-J. LIBSVM: a library for support vector ma- chines[J]. ACM Transactions on Intelligent Systems and Tech- nology (TIST) archive, 2011,2(3).
  • 4RedmanT. The impact of p o or data quality on the typical enter prise[J].Commun. ACM,1998,2:79-82.
  • 5ChandolaV,BanerjeeA,KumarV. AnomalyDeteetion:ASur vey[J].ACMComputingSurveys,2009,41(3):15.
  • 6Xiao H. Towards parallel and distributed computing in large- scale data mining; A survey[R]. Technical University of Mu- nich,2010:1-30.
  • 7http://developer, download, nvidia, com/compute/cuda/1_0/NV- IDIA CUDA Programming_Guide_l. 0. pdf.
  • 8Almasi G S, Gottlieb A. Highly Parallel Computing[M]. Benja- min-Cummings publishers Co. , Inc. Redwood City, CA, USA, 1989.
  • 9Banko M,Brill E. Scaling to very very large corpora for natural language disambiguation[C]//ACL ;01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics. Stroudsburg, PA, USA, 2001 : 26-33.
  • 10Hu W J, Song Q. An accelerated decomposition algorithm for ro- bust support vector Machines[J]. IEEE Transactions on Circuits and Systems Ⅱ: Express Briefs, 2004,51 (5) : 234-240.

共引文献14

同被引文献139

引证文献17

二级引证文献123

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部