期刊文献+

文本分类中基于AdaBoost.MR的改进中心法 被引量:2

Improved centroid classifier based on AdaBoost.MR for text categorization
下载PDF
导出
摘要 文本信息的爆炸式增长提出了对适宜实时应用的简单快速文本分类的需求,中心分类法虽然快速,但它所基于的假设常常与事实相违,导致分类模型偏差。基于集成学习中的AdaBoost.MR算法,通过利用其自适应维护权重分布的特点,用每轮的权重分布,修正传统中心法分类器偏差,突出被误分类的文档的影响,降低其误分类概率。在YQ-WEBBENCH-V1.1上的实验表明效果良好。 The rapid growth of text information brings forward urgent requirement for rapid and accurate text categorization method. The Centroid Classifier is a rapid method. Its basic assumptions usually differ from the facts and result in so-called "classifier model bias ". To conquer "classifier model bias', a AdaBoost. MR based mechanism, which employs centroid classifier as its individual classifiers, is developed to adaptively improve classifier model by focusing on examples with high weight (thus tend to be labeled incorrectly) in every iteration. The experiment on the corpus of YQ-WEBBENCH-V1.1 show that improved method can achieve better performance than traditional one.
出处 《计算机工程与设计》 CSCD 北大核心 2009年第1期122-124,131,共4页 Computer Engineering and Design
关键词 集成学习 文本分类 中心法 分类器偏差 权重分布 ensemble learning text categorization centroid classifier classifier model bias weights distribution
  • 相关文献

参考文献8

  • 1Han E,Karypis G.Centroid-based document classification analysis & experimental result[C]. PKDD,2000:116-123.
  • 2Tan Songbo, Cheng Xue-Qi, Moustafa M Ghanem. A novel refinement approach for text categorization[C].ACM CIKM,2005: 469-476.
  • 3Salton G, Wong A,Yang C,A vector space model for automatic indexing[J].Commutation of ACM, 1995,18:613 -620.
  • 4Freund Y, Schapire R E. A decision-theoretic generalization of on-line learning and an application to boosting [J]. Journal of Computer and System Sciences, 1997,55( 1): 119-139.
  • 5Schapire R, Singer Y.BoosTexter:a boosting based system for text categorization[J].Machine Learning, 2000,39(203): 135 - 168.
  • 6Krogh A,Vedelsby J. Neural network ensembles, ross validation, and active learning[C]. Tesauro G, Touretzky D S, Leen T K, et al. Advances in Neural Information Processing Systems 7, Cambridge, MA: MIT Press, 1995:231-238.
  • 7Franca Devolve, Fabrizio Sebastiani. Supervised term weighting for automated text categorization[C]. Melbourne, Florida: Proceedings of the ACM Symposium on Applied Computing, 2003: 784-788.
  • 8冯是聪,王继民.关于“中文网页自动分类竞赛”结果的分析[J].中文信息学报,2003,17(5):34-40. 被引量:6

二级参考文献3

共引文献5

同被引文献16

  • 1苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:386
  • 2姜远,周志华.基于词频分类器集成的文本分类方法[J].计算机研究与发展,2006,43(10):1681-1687. 被引量:22
  • 3THORSTEN J.Text categorization with support vector machines:learning with many relevant features[C]//European Conference on Machine Learning(ECML'97).London,1997:170-178.
  • 4ZHOU Z H,TANG W.Selective ensemble of decision tree[C]//Lecture Notes in Artificial Intelligence.Berlin:Springer,2003:476-483.
  • 5FREUND Y,SCHAPIRE R E.A decision-theoretic generalization of on-line learing and an application to boosting[J].Journal of Computer and System Sciences,1997,55(1):119-139.
  • 6Van RIJSBERGEN C J.Information retrieval[M].2 ed.London:Butterworths,1979:98-105.
  • 7崔丽娟,李凯,倪志宏.基于分类的集成学习算法研究[J].河北大学学报(自然科学版),2007,27(4):423-427. 被引量:4
  • 8Dance C, Willamowski J, Fan L, et al. Visual Categorization with Bags of Keypoints[C]//Proc. of ECCV International Workshop on Statistical Learning in Computer Vision. [S. 1.]: IEEE Press, 2004.
  • 9Dietterich T G. Machine Learning Research: Four Current " Directions[J]. AI Magazine, 1997, 18(4): 97-136.
  • 10Breiman L. Bagging Predictors[J]. Machine Learning, 1996, 24(2): 123-140.

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部