期刊文献+

改进遗传算法在文本聚类中的应用研究

RESEARCH ON THE APPLICATION OF TEXT CLUSTERING ALGORITHM BASED ON IMPROVED GENETIC ALGORITHM
下载PDF
导出
摘要 分析了K均值聚类算法(K-means)存在的不足和改进遗传算法的全局优化能力,提出一种基于改进遗传算法的文本聚类方法,该方法将原始文档转化成用向量空间模型来描述的文本向量,首先随机产生若干个文档向量作为初始聚类中心形成遗传算法的染色体种群,经过改进遗传算法的选择、交叉、变异进化运算,得到较为优化的K均值聚类算法的初始聚类中心。实验表明该算法文本聚类提高了查准率和查全率,算法的高效性也得到了验证。 A text clustering method based on improved genetic algorithm is presented after analyzing the disadvantages of KMeans algorithm and the global optimization ability of improved genetic algorithm.In this method,the original document is set into text vector by a vector space model.First,we chose initial clustering centers randomly to form chromosomes populations of genetic algorithm among document vectors.The optimized K-means clustering algorithm initial cluster centers were obtained by means of selection,crossover and mutation of improved genetic algorithm.The experiments show that the text clustering algorithm improves the precision and the recall rate and the efficiency of the algorithm has been verified.
出处 《巢湖学院学报》 2013年第3期27-31,共5页 Journal of Chaohu University
基金 安徽高校省级自然科学重点项目(项目编号:KJ2013A226) 安徽高校省级自然科学一般项目(项目编号:KJ2013B230)
关键词 遗传算法 文本聚类 向量空间模型 Genetic algorithm Text clustering Vector space model
  • 相关文献

参考文献5

二级参考文献25

  • 1杨占华,杨燕.SOM神经网络算法的研究与进展[J].计算机工程,2006,32(16):201-202. 被引量:77
  • 2XU R,Donald Wunsch Ⅱ.Survey of Clustering Algorithms[J].IEEE Transactions on Neural Networks.2005,16(3):645 -678.
  • 3ZHAO Y,KARYPIS G.Topic-driven Clustering for Document Datasets[A].SIAM 2005 Data Mining Conference[C].St.Louis,Missouri,2005.358 -369.
  • 4DHILLON I,KOGAN J,NICHOLAS C.Feature Selection and Document Clustering[A].2002 CADIP Research Symposium Proceedings[C].2002.73-100.
  • 5http://kdd.ics.uci.edu/databases/20newsgroups/20newsgroups.html.
  • 6Dhillon,I.S.,Fan,J,Guan,Y.,Efficient Clustering of Very Large Document Collections,Data Mining for Scientific and Engineering Applications[M].Kluwer Academic Publishers,2001.357-382.
  • 7HanJ KamberM.数据挖掘概念与技术[M].北京:机械工业出版社,2001.185.
  • 8Yiming Yang.A Comparative Study on Feature Selection in Text Categorization[J].The ICML97,Nashville,1997.
  • 9Monica Rogati,Y Yang.High-Performing Feature Selection for Text categorization[C].Proceedings of the Fourteenth International Conference on Machine Learning (ICML'99),2000.
  • 10Thorsten Joachims.Text Classification with Support Vector Machines:Learning with Many Relevant Feature[J].Artificial Intelligence Journal special issue:Best of IJCAI-99,2000.

共引文献117

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部