期刊文献+

GA-SVM算法在文本分类中的应用研究 被引量:12

Research of Text Categorization Based on Genetic Algorithm and Support Vector Machine
下载PDF
导出
摘要 文本特征维数通常高达几万且特征之间存在大量冗余和不相关信息,从而导致传统的分类方法效率低、分类准确率低。为了提高文本分类的快速性和准确性,提出了一种遗传算法(GA)和支持向量机(SVM)相结合的文本分类方法。把文本特征组合看作遗传算法中一个染色体,并进行二进制编码,将支持向量机分类准确率作为遗传算法的适应度函数,对每一个个体适应度的评价,通过选择、交叉和变异的遗传操作,得到文本最优特征,最后通过支持向量机利用最优特征进行分类。对复旦大学中文文本分类库进行仿真实验,实验结果表明,相对于传统的文本分类方法,能够快速地得到最优分类特征子集,大大提高文本分类的准确率,在文本挖掘中具有较好的应用前景。 In text categorization ,one problem is usually confronted with feature spaces containing 10,000 dimen- sions and more, even exceeding the number of available training samples, the precision is always difficult to be im- proved. In order to enhance operating speed and reduce memory space occupied, a feature selection method based on genetic algorithm and support vector machine is presented. In this algorithm, a feature combination is regarded as a chromosome which is then performed with binary code, and support vector machine precision set density is considered as the fitness function to evaluate the fitness of individual feature. By the operations of selection, crossover and mutation,the optimal feature set can rapidly be obtained.. The improved genetic algorithm is applied to the example of categorization data for feature optimization simulation. It is proved that this method can obtain the subset of the features which contribute to pattern classification. With the result that fault diagnosis accuracy and computational efficiency have been improved, It is a good prospect in text mining.
出处 《计算机仿真》 CSCD 北大核心 2011年第1期222-225,共4页 Computer Simulation
关键词 文本分类 遗传算法 支持向量机 特征选择 Text categorization Genetic algorithm (GA) Support vector machine ( SVM ) Feature selection
  • 相关文献

参考文献6

二级参考文献23

  • 1吴军,王作英,禹锋,王侠.汉语语料的自动分类[J].中文信息学报,1995,9(4):25-32. 被引量:24
  • 2陈彬,洪家荣,王亚东.最优特征子集选择问题[J].计算机学报,1997,20(2):133-138. 被引量:96
  • 3卜东波.聚类/分类理论研究及其在文本挖掘中的应用.中科院计算所博士学位论文[M].-,2000..
  • 4Yang Yiming, Pederson Jan O. A comparative study on feature selection in text categorization [A]. Proceedings of the 14th International Conference on Machine learning[C]. Bled: Morgan Kaufmann, 1997: 258-267.
  • 5Liu Tao, Liu Shengping, Chen Zheng. An evaluation on feature selection for text clustering [A]. Proceedings of the 20th International Conference on Machine learning[C]. Washington DC:2003.
  • 6Xiang,Jing-cheng,Wang Yi-qing.Singal Detection and Estimation.Beijing: Electronics Industry Press,1994.165~166 (in Chinese).
  • 7Lam,W.,Ruiz,M.,Srinivasan,P.Automatic text categorization and its application to text retrieval.IEEE Transactions on Knowledge and Data Engineering,1999,11(6):865~879.
  • 8Chute,C.G.An example based mapping method for text categorization and retrieval.ACM Transactions on Information System,1994,12(3):252~277.
  • 9Cohen,W.W.,Singer,Y.Context-Sensitive learning methods for text categorization.ACM Transactions on Information System,1999,17(2):141~173.
  • 10Turle,H.,Croft,B.Evaluation of an inference network net-based retrieval model.ACM Transactions on Information System,1991,9(3):187~222.

共引文献166

同被引文献78

引证文献12

二级引证文献116

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部