摘要
尽管目前存在许多文本特征选择方法,但是它们都有着一定的局限性。提出一种新的基于群体增量学习(Population Based Incremental Learning)算法的文本特征选择方法,其特点是无需特征集的先验知识和容易实现,并且由于使用了简单分类器性能作为评价准则,计算复杂度很低。对Reuters-21578文本集的分类实验结果表明,该方法平均分类性能要优于卡方统计量、信息增益和简单遗传算法三种常用的特征选择方法。
At present there are many methods to deal with text feature selection, but each of them has certain disadvantages. A novel text feature selection method using the population based incremental learning algorithm is introduced in this paper. Advantages of the proposed method are that it needs no priori knowledge of features, is easily implemented and its computational complexity is very low due to using a simple classifier. Experimental results obtained from the Reuters - 21578 dataset show that the method is better than chi - square, information gain and genetic algorithm in the performance of text categorization.
出处
《图书情报工作》
CSSCI
北大核心
2011年第24期102-105,125,共5页
Library and Information Service
基金
湖南省自然科学基金项目"电子商务环境下信任演化模型的构建与应用研究"(项目编号:10JJ6111)研究成果之一
关键词
群体增量学习
特征选择
文本分类
遗传算法
population based incremental learning feature selection text categorization genetic algorithm