摘要
针对传统文本分类方法对于海量数据分类速度慢精度差等问题,将并行计算应用到文本分类领域,设计了一套基于MapReduce的并行化文本分类框架,结合Bagging算法思想提出了支持向量机的并行训练方法,并在Hadoop云计算平台上进行了实验,实验结果表明该分类方法具有较快的分类速度和较高的分类精度。
In order to improve the performance of traditional text classification technique for massive data, this paper applied parallel computing to the field of text classification, designed a parallel text classification framework based on MapReduce, proposed a parallel Support Vector Machine (SVM) training method combining with Bagging algorithm and conducted experiments on Hadoop. The experiment results show that the proposed method is superior to other classification methods in terms of classification accuracy and classification speed.
出处
《计算机应用》
CSCD
北大核心
2013年第A02期60-62,66,共4页
journal of Computer Applications
基金
国家自然科学基金资助项目(71171148)
国家863计划项目(2012AA062206)
国家科技支撑计划项目(2012BAD35B01)
上海市科技创新计划项目(11DZ1501703)