一种基于群体增量学习算法的文本特征选择方法

A Text Feature Selection Method Using the Population Based Incremental Learning Algorithm

导出

摘要尽管目前存在许多文本特征选择方法,但是它们都有着一定的局限性。提出一种新的基于群体增量学习(Population Based Incremental Learning)算法的文本特征选择方法,其特点是无需特征集的先验知识和容易实现,并且由于使用了简单分类器性能作为评价准则,计算复杂度很低。对Reuters-21578文本集的分类实验结果表明,该方法平均分类性能要优于卡方统计量、信息增益和简单遗传算法三种常用的特征选择方法。 At present there are many methods to deal with text feature selection, but each of them has certain disadvantages. A novel text feature selection method using the population based incremental learning algorithm is introduced in this paper. Advantages of the proposed method are that it needs no priori knowledge of features, is easily implemented and its computational complexity is very low due to using a simple classifier. Experimental results obtained from the Reuters - 21578 dataset show that the method is better than chi - square, information gain and genetic algorithm in the performance of text categorization.

作者罗毅辉熊曙初

机构地区湖南商学院信息学院信息管理工程研究所

出处《图书情报工作》 CSSCI 北大核心 2011年第24期102-105,125,共5页 Library and Information Service

基金湖南省自然科学基金项目"电子商务环境下信任演化模型的构建与应用研究"(项目编号:10JJ6111)研究成果之一

关键词群体增量学习特征选择文本分类遗传算法 population based incremental learning feature selection text categorization genetic algorithm

分类号 TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献8

1刘海峰,赵华,刘守生.一种基于位置的改进中文文本特征选择[J].图书情报工作,2009,53(21):102-105. 被引量：3
2Yang Y,Pedersen J O. A comparative study on feature selection in text categorization//Fisher D H. Proceedings of the 14^th International Conference on Machine Learning. Nashville: Morgan Kaufmann Publishers, 1997:412 -420.
3Yu L, Liu H. Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 2004,5 (10) : 1205 - 1224.
4Baluja S, Davis S. Removing the genetics from the standard genetic algorithm//Prieditis A, Russell S J. Proceedings of the InternationalConference on Machine Learning. Tahoe City: Kaufmann Publish- ers,1995:38 -46.
5Mtihlenbein H. The equation for response to selection and its use for prediction. Evolutionary Computation, 1997,5 (3) :303 - 346.
6Ventresca M, Tizhoosh H R. A diversity maintaining population- based incremental learning algorithm. Information Science, 2008, 178(21 ) :4038 -4056.
7Shin K, Abraham A, Han S Y. Improving kNN text categorization by removing outliers from training set//Gelbukh A. Proceedings of the 7th International Conference of Computational Linguistics and In- telligent Text Processing. Mexico City: Springer, 2006:563 -566.
8Van Rijsbergen C J. Information retrieval. 2nd ed. London: Butter- worth, 1979 : 19 - 23.

二级参考文献8

1杨胜,顾钧.Feature selection based on mutual information and redundancy-synergy coefficient[J].Journal of Zhejiang University Science,2004,5(11):1382-1391. 被引量：7
2孙国菊,张杰.中文文本分类的特征选取评价[J].哈尔滨理工大学学报,2005,10(1):76-78. 被引量：14
3侯汉清 ,章成志 ,郑红 .Web概念挖掘中标引源加权方案初探[J].情报学报,2005,24(1):87-92. 被引量：32
4苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量：389
5刘海峰,王元元,刘守生.一种组合型中文文本分类特征选择方法[J].广西师范大学学报（自然科学版）,2007,25(4):208-211. 被引量：9
6刘海峰,王元元,姚泽清,张述祖.文本分类中一种混合型特征降维方法[J].计算机工程,2009,35(2):194-196. 被引量：11
7丁璇,侯汉清,章成志.中文网页标引源主题表达能力的调查统计[J].大学图书馆学报,2002,20(6):70-72. 被引量：29
8周茜,赵明生,扈旻.中文文本分类中的特征选择研究[J].中文信息学报,2004,18(3):17-23. 被引量：165

共引文献2

1刘盛博,丁堃,王贤文,刘则渊.基于TF／IDF多因素改进算法的知识单元抽取研究[J].情报学报,2011,30(10):1037-1043. 被引量：1
2白冰,李华,张明星.基于多层向量空间的信息检索研究[J].世界科技研究与发展,2012,34(6):976-978.

1刘宏,王其涛,夏未君.基于PBIL算法的无线传感器网络三维定位方法[J].计算机测量与控制,2016,24(1):334-337. 被引量：1
2罗印升,李人厚,张维玺.ARTIFICIAL IMMUNE ALGORITHM OF MULTICELLULAR GROUP AND ITS CONVERGENCE[J].Journal of Pharmaceutical Analysis,2005,17(2):23-27.
3陈建明.基于PBIL的综合QoS参数组播路由[J].浙江师范大学学报（自然科学版）,2010,33(1):70-74.
4LIU Chengxuan,DONG Zhenjiang,XIE Siyuan,PEI Ling.Human Motion Recognition Based on Incremental Learning and Smartphone Sensors[J].ZTE Communications,2016,14(B06):59-66.
5胡世余,谢剑英.基于PBIL进化算法的时延受限组播路由算法[J].计算机工程与应用,2004,40(25):139-141.
6霍纬纲,Qu Feng,Zhang Yuxiang.Incremental learning of the triangular membership functions based on single-pass FCM and CHC genetic model[J].High Technology Letters,2017,23(1):7-15. 被引量：1
7李娟,王宇平.基于样本密度和分类误差率的增量学习矢量量化算法研究[J].自动化学报,2015,41(6):1187-1200. 被引量：10
8SUNJin-wen YANGJian-wu LUBin XIAOJian-guo.Incremental Training for SVM-Based Classification with Keyword Adjusting[J].Wuhan University Journal of Natural Sciences,2004,9(5):805-811.
9胡世余,谢剑英.基于PBIL进化算法的VC路由算法[J].计算机工程,2004,30(18):16-17.
10胡承军.基于PBIL的多QoS约束选播路由算法[J].辽宁工程技术大学学报（自然科学版）,2009,28(3):442-444. 被引量：1

图书情报工作

2011年第24期

浏览历史

内容加载中请稍等...

一种基于群体增量学习算法的文本特征选择方法

参考文献8

二级参考文献8

共引文献2

相关作者

相关机构

相关主题

浏览历史