期刊文献+

基于词频分类器集成的文本分类方法 被引量:22

A Text Classification Method Based on Term Frequency Classifier Ensemble
下载PDF
导出
摘要 提出了一种基于词频分类器集成的文本分类方法·词频分类器是在对文本中的单词和它在每个文本中出现的频率进行统计后得到的简单分类器·虽然词频分类器本身泛化能力不强,但它不仅计算代较小,而且在训练样本甚至类别增加时易于进行更新,而整个学习系统的泛化能力可以由集成学习机制来提高,因此,词频分类器很适合用做集成学习的基分类器·在集成时,使用了改进的AdaBoost算法,加入了一种强制重新分布权的机制,避免算法过早停止,更加适合文本分类任务·在标准文集Reuters-21578上的实验结果表明,该方法能取得很好的效果· In this paper, a method of text proposed. Term frequency classifier is a kind classification based on term frequency classifier ensemble is of simple classifier obtained after calculating terms' frequency of texts in the corpus. Though the generalization ability of term frequency classifier is not strong enough, it is a qualified base learner for ensemble because of i*s low computational cost, flexibility in updating with new samples and classes, and the feasibility of improving generalization with the help of ensemble paradigms. An improved AdaBoost algorithm is used to build the ensemble, which employs a scheme of compulsive weights updating to avoid early stop. Therefore it is more suitable for text classification. Experimental results on the corpus of Reuters-21578 show that the proposed method can achieve good performance in text classification tasks.
作者 姜远 周志华
出处 《计算机研究与发展》 EI CSCD 北大核心 2006年第10期1681-1687,共7页 Journal of Computer Research and Development
基金 国家自然科学基金项目(60505013) 江苏省自然科学基金创新人才基金项目(BK2005412)~~
关键词 文本分类 机器学习 集成学习 词频分类器 ADABOOST text classification machine learning ensemble learning term frequency classifier AdaBoost
  • 相关文献

参考文献23

  • 1G Salton.Development in automatic text retrieval[J].Science,1991,253(5023):974-980
  • 2刁力力,胡可云,陆玉昌,石纯一.用Boosting方法组合增强Stumps进行文本分类(英文)[J].软件学报,2002,13(8):1361-1367. 被引量:15
  • 3S Wermter,G Arevian,C Panchev.Recurrent neural network learning for text routing[C].The Int'l Conf on Artificial Neural Networks,Edinburgh,UK,1999
  • 4马亮,陈群秀,蔡莲红.一种改进的自适应文本信息过滤模型[J].计算机研究与发展,2005,42(1):79-84. 被引量:18
  • 5F Sebastiani.Machine learning in automated text categorization[J].ACM Computing Surveys,2002,34(1):1-47
  • 6Y H Li,A K Jain.Classification of text documents[J].The Computer Journal,1998,41(8):537-546
  • 7唐春生,金以慧.基于全信息矩阵的多分类器集成方法[J].软件学报,2003,14(6):1103-1109. 被引量:18
  • 8S M Weiss,C Apte,F J Damerau,et al.Maximizing text-mining performance[J].IEEE Intelligent Systems,1999,14(4):63-69
  • 9R E Schapire,Y Singer.Boostexter:A boosting-based system for text categorization[J].Machine Learning,2000,39(2-3):135-168
  • 10T G Dietterich.Machine learning research:Four current directions[J].AI Magazine,1997,18(4):97-136

二级参考文献22

  • 1[1]Freund, Y., Schapire, R. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 1997,55(1):119~139.
  • 2[2]Breiman, L., Friedman, J., Olshen, R., et al. Classification and Regression Trees. Belmont, CA: Wadsworth, 1984. 1~357.
  • 3[3]Schapire, R., Singer, Y. BoosTexter: a boosting-based system for text categorization. Machine Learning, 2000,39(2/3):135~168.
  • 4[4]Salton, G., Wong, A., Yang, C. A vector space model for automatic indexing. Communications of the ACM, 1995,18:613~620.
  • 5[5]Schapire, R., Singer, Y. Improved boosting algorithms using confidence-related predictions. Machine Learning, 1999,37(3): 297~336.
  • 6J. Rocchio. Relevance feedback in information retrieval. In: The SMART Retrieval System. Englewood Cliffs, NJ : Prentice-Hall,1971. 313-323.
  • 7Wu Lide, Huang Xuanjing, et al. Filtering, QA, Web and video tasks. The 10th Text Retrieval Conf. , Gathersburg, USA,2001. http://trec, hist. gov/pub.s/trecl0/t10_proceedings, html.
  • 8Zhai Chengxiang, Peter Jansen, Norbert Roma, et al.Optimization in CLAR1T TREC-8 Adaptive Filtering. The 8th Text Retrieval Conf., Gathersburg, USA, 1999. http://trec. nist. gov/pubs/trec8/t8_proceedings.html.
  • 9Avi Arampatzis. Unbiased S-D threshold optimization, initial query degradation, incrementality, for adaptive filtering. The 10th Text Retrieval Conf., Gathersburg, USA, 2001. http://trec. nist. gov/pubs/trecl 0/t10_proceedings. html.
  • 10Stephen Robertson, Ian Soboroff. The TREC 2001 filtering track report. The 10th Text Retrieval Conf., Gathersburg, USA,2001. http://trec, nist. gov/pubs/trecl0/t10_proceedings, html.

共引文献48

同被引文献295

引证文献22

二级引证文献71

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部