摘要
提出了基于两步策略的3种多类多标签英文文本分类方法:①以贝叶斯为分类器,以抽取词根的单词和未抽词根的单词分别作为第一、第二步使用特征的两步方法;②以贝叶斯和决策树分别为第一、第二步使用分类器的两步方法;③以ID 3、C 4.5和贝叶斯的组合分类器对部分特定类别进行分类,然后对余下类别采用方法②进行二次分类的混合两步方法。实验表明,3种方法中方法③具有最好的性能。
This paper proposes three multi-classification and multi-label English text categorization methods based on two steps strategy. The first method classifies texts by applying Bayes classifier which uses the stemmed and no stemmed words as features separately in the first and second step. The second method uses Bayes classifier in the first step and decision tree classifier in the second step. The third method first classifies some special categories by a combined classifier of ID3,C4.5 and Bayes classifier, the rest categories will be classified by using the second method. The experiments show that the third method has the best performance.
出处
《广西师范大学学报(自然科学版)》
CAS
北大核心
2007年第4期200-203,共4页
Journal of Guangxi Normal University:Natural Science Edition
基金
重庆市自然科学基金资助项目(2005BA2003
2006BB2374)
教育部新世纪优秀人才支持计划基金资助项目(教技司[2005]2号)
关键词
文本分类
两步分类策略
分类器
text categorization
two steps categorization strategy
classifier