期刊文献+

基于Hadoop的局部支持向量机 被引量:5

Local Support Vector Machine Based on Hadoop
下载PDF
导出
摘要 随着物联网、云计算等技术的不断发展,产生的数据也以爆炸式的速度不断增长.如何在大数据中进行挖掘和分析成为了当前学术界研究的热点,Hadoop分布式计算也因此逐渐成为了大数据挖掘和分析的主要技术.支持向量机则是一种应用比较广泛的数据挖掘方法,局部支持向量机是在支持向量机的基础上引入局部学习算法的一种有效的分类算法.但是,局部支持向量机需要为每个测试样本分别构造分类器,在大数据上进行分类的时间复杂度较高,分类效率比较低.针对上述问题,结合Hadoop并行计算平台,提出了基于Hadoop的局部支持向量机算法.本文对局部支持向量机进行了两方面的改进:1)将计算测试样本的k近邻并行化;2)将训练模型并行化.测试实验结果表明:基于Hadoop的局部支持向量机能够有效降低分类时间,且在分类精度上与局部支持向量机基本保持一致. With the continuous development of Internet of things,cloud computing technology,the generated data is growing at an explosive rate.How to mine and analyze them has become a hot research in the present academic circles.Hadoop distributed computing platform has become the main technology of data analysis.Support vector machine is widely used in data mining,and local support vector machine is a new classification algorithm that is based on support vector machine.But local support vector machine constructs classifier for each test samples.In large data carries on the classification of high time complexity,the classification efficiency is low.In view of the above problems,combined with the Hadoop parallel computing platform,we propose a local support vector machine algorithm based on Hadoop.This paper makes two improvements on the local support vector machine:1)the calculation of k-nearest neighbor for the test sample is parallelized;2)the training of model is parallelized.Test results show that the local support vector machine based on Hadoop can effectively reduce the classification time,and the classification accuracy of this algorithm is consistent with the classification accuracy in local support vector machine.
出处 《计算机研究与发展》 EI CSCD 北大核心 2014年第S2期116-121,共6页 Journal of Computer Research and Development
基金 山东省自然科学基金项目(ZR2012FM024) 山东省农业重大应用技术创新课题基金项目
关键词 HADOOP 大数据分析 局部支持向量机 大数据 Hadoop big data analytics local support vector machine big data
  • 相关文献

参考文献12

二级参考文献93

  • 1宁焕生,张瑜,刘芳丽,刘文明,渠慎丰.中国物联网信息服务系统研究[J].电子学报,2006,34(B12):2514-2517. 被引量:151
  • 2J Dean,S Ghemawat.MapReduce:Simplified data processing on large clusters[J].Communications of the ACM,2008,51(1):107-113.
  • 3J L Wagener.High performance fortran[J].Computer Standards & Interfaces,Elsevier,1996,18(4):371-377.
  • 4W Gropp,E Lusk,et al.Using MPI:Portable Parallel Programming with the Message Passing Interface[M].Cambridge:MIT Press,1999.1-350.
  • 5A Geist,A Beguelin,et al.PVM:Parallel Virtual Machine:A Users' Guide and Tutorial for Networked Parallel Computing[M].Cambridge:MIT Press,1995.1-299.
  • 6A Verma,N Zea,et al.Breaking the mapreduce stage barrier .Proc of IEEE International Conference on Cluster Computing .Los Alamitos:IEEE Computer Society,2010.235-244.
  • 7H C Yang,A Dasdan,et al.Map-Reduce-Merge:Simplified relational data processing .Proc of ACM SIGMOD International Conference on Management of Data .New York:ACM,2007.1029-1040.
  • 8S V Valvag,D Johansen.Oivos:Simple and efficient distributed data processing .Proc of IEEE International Conference on High Performance Computing and Communications .Piscataway:IEEE,2008.113-122.
  • 9Z Vrba,P Halvorsen,et al.Kahn process networks are a flexible alternative to mapreduce .Proc of IEEE International Conference on High Performance Computing and Communications .Piscataway:IEEE,2009.154-162.
  • 10Apache hadoop .http://lucene.apache.org/hadoop/,2010-10-15/2010-12-28.

共引文献1816

同被引文献28

引证文献5

二级引证文献25

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部