期刊文献+

Web自动文本分类技术研究综述 被引量:9

A Literature Review on Web Automated Text Categorization Technology
下载PDF
导出
摘要 Web自动文本分类是信息检索与数据挖掘领域的研究热点与核心技术,近年来得到了广泛的关注和快速的发展。本文首先分析了国内外Web自动文本分类方法的研究现状,接着对新近出现的多分类器融合的方法、基于群的分类方法、基于RBF网络的文本分类模型、基于模糊-粗糙集的文本分类模型、潜在语义分类模型等新方法,以及K-近邻算法和支持向量机的新发展等进行了深入探讨;并对Web自动文本分类过程中的几个关键技术:文本预处理、文本表示、特征降维、训练方法和分类算法等进行了分析;最后总结了当前Web自动文本分类技术存在的问题及其发展趋势。 In recent years,there have been extensive studies and rapid progresses in automated text categorization,which is one of the hotspots and key techniques in the information retrieval and data mining field.This article has analyzed the research present situation of domestic and foreign Web text categorization method firstly,has analyzed the new methods which recently appeared,swarm-based approaches,based on the fuzzy-rough collection text classification model,the multi-sorters fusion method,based on RBF network text categorization model,latent semantic classification model and so on,as well as the recent development of the K-NN and the support vector machine(SVM)method;And has discussed the Web text categorization process several essential technologies:The text pretreatment,the text expressed,the characteristic fell Uygur,the training method and the classified algorithm;Finally summarized the development deficiency and tendency of Web automated text categorization technology.
作者 蒲筱哥
出处 《情报学报》 CSSCI 北大核心 2009年第2期233-241,共9页 Journal of the China Society for Scientific and Technical Information
关键词 文本分类 分类方法 文本表示 特征选择 text categorization categorization method text representation feature selection
  • 相关文献

参考文献56

  • 1Aas K,Eikvil A.Text Categorisation:A Survey[R].Norwegian Computing Center,http://citeseer.nj.nec.com/aas99text.html,1999.
  • 2Lewis D D,Stern D L,Singhal A.Attics:a software platform for online text classification.In Proceedings of SIGIR-99,22nd AC'M International Conference on Research and Development in Information Retrieval (Berkeley,US,1999),1999,267-268.
  • 3Chen Hao,Dumais S T.Bringing order to the Web:automatically categorizing search results[C]∥Proceedings of CHI-00,ACM International Conference on Human Factors in Computing Systems,Den Haag,NL,2000.ACM Press,New York,US.145-152.
  • 4Giraldez I E.et al.Chacon.HERMES:Intelligent multilingual news filtering based on language engineering for advanced user profiling[C]∥Multilingual Information Access and Natural Language Processing Workshop Proceedings.2002:81-88.
  • 5王本年,高阳,陈世福,谢俊元.Web智能研究现状与发展趋势[J].计算机研究与发展,2005,42(5):721-727. 被引量:23
  • 6侯汉清.分类法的发展趋势简论[M].北京:中国人民大学出版社,1981.
  • 7李晓黎,刘继敏,史忠植.概念推理网及其在文本分类中的应用[J].计算机研究与发展,2000,37(9):1032-1038. 被引量:57
  • 8黄营警,吴立德.独立于语种的文本分类方法[C]∥2000 International Conference on Multilingual Information Processing.2000:37-43.
  • 9Breiman L.Bagging predictors[J].Machine Learning,1996,24:123-140.
  • 10Schapire F,Freund Y,Schapire R E.Experiments with a new boosting algorithm[C]∥Machine Learning:Proceedings of the thirteenth International Conference,Morgan Kaufmann,1996:148-156.

二级参考文献91

  • 1曾黄麟.粗集理论及其应用--关于数据推理的新方法[M].重庆:重庆大学出版社,1998..
  • 2[1]Dubois D,Prade H. Putting rough sets and fuzzy sets together [A]. Intelligent Decision Support: Handbook of Applications and Advanced of the Rough Set Theory [C].Boston: Slowinski R ED, Kluwer Academic Publishers, 1992. 203 - 222.
  • 3[2]Yao Y Y. A comparative study of fuzzy sets and rough sets [J]. Information Sciences, 1998,109 (1-4): 227 -242.
  • 4[4]Keller J M, Gray M R, Givens J A. A fuzzy k-nearest neighbor algorithm [J]. IEEE Transactions on System Man and Cybernetics, 1985,15 (4) :580 - 585.
  • 5[5]Yang Y,Pederen J P. A comparative study on feature selection in text categorization [A]. Proceeding of the Fourteenth International Conference on Machine Learning (ICML97) [C]. Nashville Tennessee USA :Morgan Kaufmann, 1997.412 - 420.
  • 6[7]Denoeux T. A k-nearest neighbor classification rule based on Dempster-Shafer theory [J]. IEEE Transactions on System Man and Cybernetics, 1995,25(5):804 -813.
  • 7[8]Francois J, Grandvalet Y, Denoeux T, et al. Resample and combine:An approach to improving uncertainty representation in evidential pattern classification [J]. Information Fusion,2003 (4) :75 -85.
  • 8[1]Sebastiani F. Machine learning in automated text categorization [J]. ACM Computing Survey, 2002,34 (1):1 -47.
  • 9[2]Deerwester S,Dumais S T,Furnas G W,et al. Indexing by latent semantic analysis [J]. Journal of the American Society of Information Science, 1990,41 (6) :391 - 407.
  • 10[3]Dumais S T. Using LSI for information filtering [A].Harman D. The Third Text Retrieval Conference ( TREC - 3) [C]. USA: National Institute of Standards and Technology Special Publication, 1995.

共引文献186

同被引文献73

引证文献9

二级引证文献53

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部