期刊文献+

基于两步策略的英文文本分类 被引量:1

English Texts Categorization in Two Steps
下载PDF
导出
摘要 提出了基于两步策略的3种多类多标签英文文本分类方法:①以贝叶斯为分类器,以抽取词根的单词和未抽词根的单词分别作为第一、第二步使用特征的两步方法;②以贝叶斯和决策树分别为第一、第二步使用分类器的两步方法;③以ID 3、C 4.5和贝叶斯的组合分类器对部分特定类别进行分类,然后对余下类别采用方法②进行二次分类的混合两步方法。实验表明,3种方法中方法③具有最好的性能。 This paper proposes three multi-classification and multi-label English text categorization methods based on two steps strategy. The first method classifies texts by applying Bayes classifier which uses the stemmed and no stemmed words as features separately in the first and second step. The second method uses Bayes classifier in the first step and decision tree classifier in the second step. The third method first classifies some special categories by a combined classifier of ID3,C4.5 and Bayes classifier, the rest categories will be classified by using the second method. The experiments show that the third method has the best performance.
出处 《广西师范大学学报(自然科学版)》 CAS 北大核心 2007年第4期200-203,共4页 Journal of Guangxi Normal University:Natural Science Edition
基金 重庆市自然科学基金资助项目(2005BA2003 2006BB2374) 教育部新世纪优秀人才支持计划基金资助项目(教技司[2005]2号)
关键词 文本分类 两步分类策略 分类器 text categorization two steps categorization strategy classifier
  • 相关文献

参考文献5

  • 1SEBASTIANI F. Machine learning in automated text categorization[J]. ACM Computing Surveys, 2002,34(1):1-47.
  • 2张玉芳,陈剑敏,熊忠阳.一种改进的贝叶斯文本分类方法[J].广西师范大学学报(自然科学版),2007,25(2):206-209. 被引量:7
  • 3樊兴华,孙茂松.一种高性能的两类中文文本分类方法[J].计算机学报,2006,29(1):124-131. 被引量:70
  • 4FAN Xing-hua,SUN Mao-song,CHOI Key-sun,et al. Classifying Chinese texts in two steps [C]//DALE R,WONG Kam-fai, SU Jian, et al. Proceedings of 2nd International Joint Conference on Natural Language Processing. Berlin:Springer-Verlag, 2005: 302-313.
  • 5YANG Yi-ming,LIU Xin. A re-examination of text categorization methods [C]//Proceedings of the 22nd annual international ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM Press, 1999:42-49.

二级参考文献18

  • 1Lewis D. D.. An evaluation of phrasal and clustered representalions on a text categorization task. In: Proceedings of SIGIR'92,the 15st ACM International Conference on Research and Development in Information Retrieval, Copenhagen, Denmark,1992, 37-50.
  • 2Sebastiani F,. Machine learning in automated text categorization. ACM Computing Surveys, 2002, 34(1): 1-47.
  • 3Lewis D.. Naive bayes at forty: The independence assumption in information retrieval. In: Proceedings of the 10th European Conference on Machine Learning, Chemnitz, Germany, 1998,4-15.
  • 4Salton G.. Automatic Text Processing: The Transformation,Analysis, and Retrieval of Information by Computer. Reading,MA: Addison Wesley, 1989.
  • 5Mitchell T. M.. Machine Learning. New York: McCraw Hill,1996.
  • 6Joachims T.. Text categorization with support vector machines: Learning with many relevant features. In: Proceedings of the 10th European Conference on Machine Learning,Chemnitz, Germany, 1998, 137-142.
  • 7Yang Y. , Liu X.. A Re-examination of text categorization methods. In: Proceedings of SIGIR'99, the 22nd ACM International Conference on Research and Development in Information Retrieval, Berkeley, CA, 1999, 42-49.
  • 8樊兴华.因果推理和文本分类.清华大学博士后出站报告,2004.
  • 9Larkey L. S.. Automatic essay grading using text categorization techniques.. In: Proceedings of SIGIR'98, the 21st ACM International Conference on Research and Development in Information Retrieval, Melbourne, Australia, 1998, 90-95.
  • 10Dumais S. T. , Platt J. , Hecherman D. , Sahami M.. Inductive learning algorithms and representation for text categorization.In: Proceedings of CIKM'98, the 7th ACM International Conference on Information and Knowledge Management, Bethesda, MD, 1998, 148-155.

共引文献75

同被引文献7

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部