摘要
随着物联网、云计算等技术的不断发展,产生的数据也以爆炸式的速度不断增长.如何在大数据中进行挖掘和分析成为了当前学术界研究的热点,Hadoop分布式计算也因此逐渐成为了大数据挖掘和分析的主要技术.支持向量机则是一种应用比较广泛的数据挖掘方法,局部支持向量机是在支持向量机的基础上引入局部学习算法的一种有效的分类算法.但是,局部支持向量机需要为每个测试样本分别构造分类器,在大数据上进行分类的时间复杂度较高,分类效率比较低.针对上述问题,结合Hadoop并行计算平台,提出了基于Hadoop的局部支持向量机算法.本文对局部支持向量机进行了两方面的改进:1)将计算测试样本的k近邻并行化;2)将训练模型并行化.测试实验结果表明:基于Hadoop的局部支持向量机能够有效降低分类时间,且在分类精度上与局部支持向量机基本保持一致.
With the continuous development of Internet of things,cloud computing technology,the generated data is growing at an explosive rate.How to mine and analyze them has become a hot research in the present academic circles.Hadoop distributed computing platform has become the main technology of data analysis.Support vector machine is widely used in data mining,and local support vector machine is a new classification algorithm that is based on support vector machine.But local support vector machine constructs classifier for each test samples.In large data carries on the classification of high time complexity,the classification efficiency is low.In view of the above problems,combined with the Hadoop parallel computing platform,we propose a local support vector machine algorithm based on Hadoop.This paper makes two improvements on the local support vector machine:1)the calculation of k-nearest neighbor for the test sample is parallelized;2)the training of model is parallelized.Test results show that the local support vector machine based on Hadoop can effectively reduce the classification time,and the classification accuracy of this algorithm is consistent with the classification accuracy in local support vector machine.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2014年第S2期116-121,共6页
Journal of Computer Research and Development
基金
山东省自然科学基金项目(ZR2012FM024)
山东省农业重大应用技术创新课题基金项目