期刊文献+

基于并行计算的文本分类技术 被引量:4

Text classification based on parallel computing
下载PDF
导出
摘要 针对传统文本分类方法对于海量数据分类速度慢精度差等问题,将并行计算应用到文本分类领域,设计了一套基于MapReduce的并行化文本分类框架,结合Bagging算法思想提出了支持向量机的并行训练方法,并在Hadoop云计算平台上进行了实验,实验结果表明该分类方法具有较快的分类速度和较高的分类精度。 In order to improve the performance of traditional text classification technique for massive data, this paper applied parallel computing to the field of text classification, designed a parallel text classification framework based on MapReduce, proposed a parallel Support Vector Machine (SVM) training method combining with Bagging algorithm and conducted experiments on Hadoop. The experiment results show that the proposed method is superior to other classification methods in terms of classification accuracy and classification speed.
出处 《计算机应用》 CSCD 北大核心 2013年第A02期60-62,66,共4页 journal of Computer Applications
基金 国家自然科学基金资助项目(71171148) 国家863计划项目(2012AA062206) 国家科技支撑计划项目(2012BAD35B01) 上海市科技创新计划项目(11DZ1501703)
关键词 文本分类 并行计算 支持向量机 MAPREDUCE text classification parallel computing Support Vector Machine (SVM) MapReduce
  • 相关文献

参考文献12

  • 1DEAN J, GHEMAWAT S. MapReduce: simplified data processingon large clusters [ C] // Proceedings of the 6th Symposium on Oper-ating Systems Design and Implementation. San Francisco, CA,USA: USENIX Association, 2004: 137-149.
  • 2张东礼,汪东升,郑纬民.基于VSM的中文文本分类系统的设计与实现[J].清华大学学报(自然科学版),2003,43(9):1288-1291. 被引量:16
  • 3YANG Y,PEDERSEN J 0. A comparative study on feature selec-tion in text categorization [ C]// Proceedings of the Fourteenth Inter-national Conference on Machine Learning. San Francisco: MorganKaufmann, 1997: 412 -420.
  • 4FORMAN G. An extensive empirical study of feature selection met-rics for text classification[ J]. Machine Learning Research, 2003,3(1):1289 -1305.
  • 5CORTES C,VAPNIK V. Support-vector networks [ J]. MachineLearning, 1995, 20(3):273 -297.
  • 6VAPNIK V. The nature of the statistical learning theory [ M]. NewYork; Springer, 1999.
  • 7黄陳.支持向量机核函数的研究[D].苏州:苏州大学,2008.
  • 8OSUNA E,FREUND R,GIROSI F. Training support vector ma-chines: an application to face detection [ C] // Proceedings of the1997 IEEE Computer Society Conference on Computer Vision andPattern Recognition. Washington, DC: IEEE Computer Society,1997: 130-136.
  • 9SCHOLKOPF B, BURGES C,SMOLA A J. Advances in kernelmethods - support vector learning [ M]. Cambridge: MIT Press,1999:185 -208.
  • 10LI H G, WU G Q. K-means clustering with bagging and MapReducef CJ// Proceedings of the 2011 44th Hawaii International Conferenceon System Sciences. Washington, DC: IEEE Computer Society,2011: 1 -8.

二级参考文献8

  • 1YANG Yiming, LIU Xin, A re-examination of text categorization methods [EB/OL]. http: //citeseer. nj. nec.com/yang99reexamination. html, 1999.
  • 2Cohen W W, Singer Y. Context-sensitive learning methods for text categorization [EB/OL], http: //citeseer. nj. nec.com/cohen96contextsensitive, html, 1996.
  • 3David D. Lewis, Training algorithms for linear text classifier[EB/OL]. http: //citeseer. nj. nec. com/lewis96training.html, 1996,.
  • 4Salton G, Wang A, Yang C S. A vector space model for automatic indexing [J]. Communication of ACM, 1975,18(11): 613 - 620 .
  • 5Salton G, Buekley C. Term weighting approaches in automatic text retrieval [J]. Information Processing and Management, 1988, 24(5): 513-523.
  • 6鲁松,李晓黎,白硕,王实.文档中词语权重计算方法的改进[J].中文信息学报,2000,14(6):8-13. 被引量:120
  • 7庞剑锋,卜东波,白硕.基于向量空间模型的文本自动分类系统的研究与实现[J].计算机应用研究,2001,18(9):23-26. 被引量:293
  • 8李勇,桑艳艳.网络文本数据分类技术与实现算法[J].情报学报,2002,21(1):21-26. 被引量:29

共引文献17

同被引文献39

  • 1庄东,陈英.基于加权近似支持向量机的文本分类[J].清华大学学报(自然科学版),2005,45(S1):1787-1790. 被引量:16
  • 2孙晋文,肖建国.基于SVM的中文文本分类反馈学习技术的研究[J].控制与决策,2004,19(8):927-930. 被引量:16
  • 3朱远平,戴汝为.基于SVM决策树的文本分类器[J].模式识别与人工智能,2005,18(4):412-416. 被引量:24
  • 4Dean J,Ghemawat S.MapReduce:simplified data processing on large clusters[J].Communications of the ACM,2008,51(1):107-113.
  • 5Sebastiani F.Machine learning in automated text categorization[J].ACM Computing Surveys,2002,34(1):1-47.
  • 6Wang Jinlin,Chen Xi,Zhou Kefa,et al.Parallel research of sequential pattern data mining algorithm[C] //Proc of International Conference on Computer Science and Software Engineering.[S.l.] :IEEE Press,2008:348-353.
  • 7Kruengkrai C,Jaruskulchai C.A parallel learning algorithm for text classification[C] //Proc of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.[S.l.] :ACM Press,2002:201-206.
  • 8Gil-Garcí A R,Badía-Contelles J M,Pons-Porrata A.Parallel nearest neighbour algorithms for text categorization[M] //Euro-Par Parallel Processing.Berlin:Springer,2007:328-337.
  • 9White T.Hadoop:the definitive guide[M].[S.l.] :OReilly Media Inc,2009.
  • 10McCallum A,Nigam K.A comparison of event models for naive Bayes text classification[C] //Proc of AAAI/ ICML Workshop on Learning for Text Categorization.1998:41-48.

引证文献4

二级引证文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部