摘要
文本信息的爆炸式增长提出了对适宜实时应用的简单快速文本分类的需求,中心分类法虽然快速,但它所基于的假设常常与事实相违,导致分类模型偏差。基于集成学习中的AdaBoost.MR算法,通过利用其自适应维护权重分布的特点,用每轮的权重分布,修正传统中心法分类器偏差,突出被误分类的文档的影响,降低其误分类概率。在YQ-WEBBENCH-V1.1上的实验表明效果良好。
The rapid growth of text information brings forward urgent requirement for rapid and accurate text categorization method. The Centroid Classifier is a rapid method. Its basic assumptions usually differ from the facts and result in so-called "classifier model bias ". To conquer "classifier model bias', a AdaBoost. MR based mechanism, which employs centroid classifier as its individual classifiers, is developed to adaptively improve classifier model by focusing on examples with high weight (thus tend to be labeled incorrectly) in every iteration. The experiment on the corpus of YQ-WEBBENCH-V1.1 show that improved method can achieve better performance than traditional one.
出处
《计算机工程与设计》
CSCD
北大核心
2009年第1期122-124,131,共4页
Computer Engineering and Design
关键词
集成学习
文本分类
中心法
分类器偏差
权重分布
ensemble learning
text categorization
centroid classifier
classifier model bias
weights distribution