期刊文献+

基于MapReduce编程模型的TFIDF算法研究

Research of TFIDF algorithm based on the MapReduce programming model
下载PDF
导出
摘要 随着Internet等技术的飞速发展,信息处理已经成为人们获取有用信息不可或缺的工具,如何在海量信息中高效地获得有用信息至关重要,因此自动文本分类技术尤为重要。现有的文本分类算法在时间复杂性和空间复杂性上遇到瓶颈,不能满足人们的需求,为此提出了基于Hadoop分布式平台的TFIDF算法,给出了算法实现的具体流程,通过MapReduce编程实现了该算法,并在单机和集群模式下进行了对比实验,同时与传统串行算法进行了对比。实验证明,使用TFIDF文本分类算法可实现对海量数据的高速有效分类。 With the highspeed development of Internet, information processing has become an indispensable tool for peple ob- tain useful information. So automatic text classification technology is especially important. The existing classification algorithm in the time eomplexity and space complexity meet the bottleneck, and can't satisfy people's needs, this paper puts forward the TFIDF al- gorithm based on Hadoop distributed platform, and gives the specific process of the algorithm, through the MapReduce programming realized TFIDF classification algorithm, and compares with the traditional serial algorithm,also in single and cluster mode with contrast experiment, the experiment proved that, the use of text categorization algorithm TFIDF realize high-speed effective classification of mass data.
出处 《微型机与应用》 2013年第4期71-73,共3页 Microcomputer & Its Applications
基金 国家自然科学基金资助项目(61163025) 教育部春晖计划资助项目(Z2009-1-01044)
关键词 文本分类 MAPREDUCE 并行化 TFIDF算法 text classification MapReduce parallelization TFIDF algorithm
  • 相关文献

参考文献7

二级参考文献23

  • 1高洁,吉根林.文本分类技术研究[J].计算机应用研究,2004,21(7):28-30. 被引量:36
  • 2Sebastiani F. Text Categorization[Z]. Encyclopedia of Database Technologies and Applications. 2005..683-687.
  • 3Joachims T. A Probabilistic Analysis of the Rocchio Algorithm with TF1DF for Text Categorization[C]//Proceedings of the Fourteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc. San Francisco, CA, USA, 1997.
  • 4Yang Y. An Evaluation of Statistical Approaches to Text Categorization[J]. Journal of Information Retrieval, 1999, 1 (1/2) :67-88.
  • 5Rocchio J J Jr. Relevance Feedback in Information Retrieval [M]. Salton G, ed. The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall, Inc. , Englewood Cliffs, New Jersey, 1971 : 313-323.
  • 6Tzeras K, Hartmann S. Automatic Indexing Based on Bayesian Inference Networks[C]//Proc. 16th ACM Int. SIGIR Conference. 1993: 22-34.
  • 7Masand B, Lino G, Waltz D. Classifying News Stories Using Memory Based Reasoning[C]//15th ACM SIGIR Conference. 1992:59-65.
  • 8Apte C, Damerau F, Weiss S. Automated Learning of Decision Rules for Text Categorization[J]. ACM Trans. on Information Systems, 1994,12(3) : 233-251.
  • 9Joachims T. Text Categorization with Support Vector Machines:Learning with Many Relevant Features [C]//Proc. 10th European Conference on Machine Learning (ECML). 1998:137-142.
  • 10Salton G, Buckley C. Term Weighting Approaches in Automatic Text Retrieval [J]. Information Processing and Management, 1988,24(5) :513-523.

共引文献51

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部