期刊文献+

基于遗传算法的文本聚类特征选择 被引量:3

Feature Selection for Text Clustering Based on the Genetic Algorithm
下载PDF
导出
摘要 传统的文本聚类特征选择方法不能发现最优特征集,而遗传算法能获得全局最优解且具有高的寻优效率,因此提出利用遗传算法进行文本聚类的特征选择.把一种特征组合看作一个染色体,对其进行二进制编码,引入文本集密度作为适应度函数进行特征个体适应度的评价.通过选择、交叉和变异的遗传操作,能较为快速地求出最优特征集.对公开的文本分类语料所进行的实验表明,基于遗传算法的特征选择使文本聚类结果的精度较之特征选择前提高了5.9%,而聚类时间减少了15s. As the traditional feature selection methods for text clustering cannot find the best feature set, the genetic algorithm is applied to the feature selection because it can get the global optimal solution and is of high searching efficiency. In this algorithm, a feature combination is regarded as a chromosome which is then performed with binary code, and the text set density is considered as the fitness function to evaluate the fitness of individual feature. By the operations of selection, crossover and mutation, the optimal feature set can rapidly be rapidly obtained. Experimental results on the open corpus show that the feature selection based on the genetic algorithm improves the text clustering precision by 5.9% and decreases the clustering time by 15s.
出处 《华南理工大学学报(自然科学版)》 EI CAS CSCD 北大核心 2004年第z1期133-136,共4页 Journal of South China University of Technology(Natural Science Edition)
关键词 遗传算法 文本聚类 特征选择 中文信息处理 genetic algorithm text clustering feature selection Chinese information processing
  • 相关文献

参考文献9

  • 1[1]Kowalski G. Information Retrieval Systems Theory and Implementation [M]. Netherlands: Kluwer Academic Publishers, 1997.
  • 2[2]Zamir O,Etzioni O,Madani O,et al. Fast and intuitive clustering of Web documents [A]. Proc of KDD-97 [C].Newport Beach, USA, 1997. 287 - 290.
  • 3[3]Cutting D R, Karger D R, Pedersen J O, et al. Scatter/gather:A cluster-based approach to browsing large document collections [A]. Proc of SIGIR ′92 [C]. Copenhagen, 1992. 318 - 329.
  • 4[4]Aggrawal C C,Yu P S. Finding generalized projected clusters in high dimensional spaces [A]. Proc of SIGMOD′00 [C]. Dallas ,USA ,2000.70 - 81.
  • 5[5]Yang Y. Noise reduction in a statistical approach to text categorization [A]. Proc of SIGIR′95 [C]. Seattle,USA, 1995. 256 - 263.
  • 6[6]Yang Y,Pedersen J O. A comparative study on feature selection in text categorization [A]. Proc of ICML-97[C]. Nashville, USA, 1997.412 - 420.
  • 7[7]Vafaie H, De Jong K. Genetic algorithm as a tool for feature selection in machine learning [A]. International Conference on Tools with AI [C]. Arlington,Va, 1992.200 - 204.
  • 8刑文训.现代优化计算方法[M].北京:清华大学出版社,1999..
  • 9[10]Salton G. Automatic Text Processing:The Transformation, Analysis, and Retrieval of Information by Computer [M]. Boston: Addison-Wesley, 1989.

共引文献43

同被引文献12

  • 1孙雷,王新.一种基于遗传操作和类内类间距离判据理论的特征选择方法[J].计算机工程与应用,2004,40(21):178-181. 被引量:8
  • 2郑红军,杨冰.Internet查询中基于元遗传算法的信息过滤研究[J].情报杂志,2005,24(11):70-71. 被引量:1
  • 3李桂芳,刘培玉.一种基于改进遗传算法的文本特征选择方法[J].山东师范大学学报(自然科学版),2007,22(2):17-19. 被引量:4
  • 4Cover T M.The best two independent measurements arenot the two best[J].IEEE Transactions on Systems,Manand Cybernetics,1974,4:116-117.
  • 5刑文训 谢金星.现代优化计算方法[M].北京:清华大学出版社,1999.193-246.
  • 6Yiming Yang,Thomas Ault.Thomas Pierce and Cha W Lattimer.Improving text categorization method sevent tracking[C].Proceedings of ACM SIGIR Conference on Research and Development information Retrieval (SIGIR.00),2000:65-72.
  • 7Vafaie H,De Jong K.Genetic algorithm as a tool for feature selection in machine learning[A].International Conference on Tools with AI[C].Arlington,Va,1992.200 -204.
  • 8Dom B,Nilack W,Sheinvald J.Feature selection with stochastic complexity[C].In:Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,1989.
  • 9Vafaie H,De Jong K.Genetic algorithm as a tool for feature selection in machine learning[A].International Conference on Tools with AI[C].Arlington,Va,1992,200-204.
  • 10Dik L,Lee H.Doucument ranking and the vector-space modal.IEEE software 1997,4,67-75.

引证文献3

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部