期刊文献+

融合SOM和改进PSO的Web文档集成聚类算法 被引量:2

Integrated clustering algorithm based on hybrid of SOM and improved PSO for Web document
下载PDF
导出
摘要 随着信息的爆炸式增长,现有的搜索引擎在很多方面不能满足人们的需要。Web文档聚类可以减小搜索空间,加快检索速度,提高查询精度。提出了一种融合SOM(Self-Organizing Maps)粗聚类和改进PSO(Particle Swarm Optimization)细聚类的Web文档集成聚类算法。首先根据向量空间模型表示法,用特征词条及其权值表示Web文档信息,其次用SOM算法对文档特征集进行粗聚类,得到一组输出权值,然后用这组权值初始化改进的PSO算法,用改进PSO算法对此聚类结果进行细化,最终实现Web文档聚类。仿真结果表明,该算法能有效提高文档查询的查准率和查全率,具有一定的实用价值。 With the explosive growth of Web information in Internet,it seems that the current search engines cannot meet the requirement of users in many aspects.By grouping similar Web documents into clusters, the search space can be reduced, the search accelerated,and its precision improved.An integrated clustering algorithm for Web document is proposed in this paper,which combines SOM to realize coarse clustering and the improved PSO to realize fine clustering.Firstly,the Web document is expressed as feature lemma and its weight by the vector space model.Secondly,the SOM algorithm is used to realize coarse clustering of the document feature set and a group of output weights can be obtained.Then the improved PSO algorithm is initialized with the output weights and fine clustering can be realized by the algorithm evolution,thus Web document clustering is implemented finally.Simulation result shows that the algorithm can greatly improve the precision and recall of document searching,and have certain practical value.
作者 宋剑杰 王伟
出处 《计算机工程与应用》 CSCD 北大核心 2010年第34期111-114,共4页 Computer Engineering and Applications
关键词 WEB文档聚类 自组织特征映射 粗聚类 改进PSO算法 细聚类 集成聚类算法 Web document clustering self-organizing maps coarse clustering improved Particle Swarm Optimization(PSO) al- gorithm fine clustering integrated clustering algorithm
  • 相关文献

参考文献16

  • 1Zhang D.Semantic,hierarchical, online clustering of web search results[C]//Proceedings of the 6th Asia Pacific Web Conference, Hangzhou, China, 2004 : 69-78.
  • 2Rocehio J J.Document retrieval systems-optimization and evaluation[D].Harvard University, 1966.
  • 3Willet P.Recent trends in hierarchical document clustering: A critical review[J].Information Processing and Management,1988, 24(5) : 577-597.
  • 4Lee C H, Yang H C.A web text mining approach based on self-organizing map[C]//Proceedings of the 2rid International Workshop on Web Information and Data Management, 1999: 59-62.
  • 5Oren Z,Oren E.Web document clustering:A feasibility demonstration[C]//Proc ACM SIGIR' 98,1998.
  • 6宋擒豹,沈钧毅.基于关联规则的Web文档聚类算法[J].软件学报,2002,13(3):417-423. 被引量:41
  • 7何婷婷,戴文华,焦翠珍.基于混合并行遗传算法的文本聚类研究[J].中文信息学报,2007,21(4):55-60. 被引量:11
  • 8赵小龙,张步群,丁为民.基于粒计算Web文档聚类[J].计算机工程与应用,2008,44(13):141-143. 被引量:1
  • 9Kohonen T.Automatic formation of topological maps in self-or- ganizing systems[C]//Proceedings of the 2nd Scandinavian Conf on /mage Analysis, 1981:214-220.
  • 10Kohonen T, Somervno P.How to make large self-organizing maps for nonvectorial data[J].Neural Network, 2002,15: 945-952.

二级参考文献48

共引文献466

同被引文献26

  • 1李华昌,谢淑兰,易忠胜.遗传算法的原理与应用[J].矿冶,2005,14(1):87-90. 被引量:43
  • 2张建海,张森林.交互式彩色图像分色换色算法及实现[J].纺织学报,2005,26(2):108-110. 被引量:5
  • 3乔均俭,付君丽,徐雅玲.应用遗传算法原理确定函数的最优解[J].微计算机信息,2007(18):240-241. 被引量:14
  • 4黄发良.Web信息网络社区挖掘的关键技术研究[D].广州:华南理工大学,2011年.
  • 5崔文迪,蔡佳佳.基于K-means算法和FCM算法的聚类研究[J].现代计算机,2007,13(10):7-9. 被引量:3
  • 6HOSSAINI Z,RAHMANI A M,SETAYESHI S.Web pages classification and clustering by means of genetic algorithm:a variable size page representation approach[C]//Proceedings of2008 International Conference on Computational Intelligence for Modelling Control and Automation.[S.l.]:IEEE,2008:436440.
  • 7WEI J X,LIU H,SUN Y H,et al.Application of genetic algorithm in document clustering[C]//Proceedings of 2009 International Conference on Information Technology and Computer Science.Kiev:IEEE,2009:145148.
  • 8ZHU Z Y,HAN P,YU C L,et al.A dynamic genetic algorithm for clustering Web pages[C]//Proceedings of 2010 2nd International Conference on Software Engineering and Data Mining.Chengdu,China:IEEE,2010:506511.
  • 9KU() C F J,SHIH C Y, LEE J Y. Separating color and identifying repeat pattern through the automatic computerized analysis system for printed fabrics[J].Journal of Information Science ~ Engineering, 2008.24 (2) . 453-467.
  • 10AKIN D E, EPPS H H.ARCHIBALD D D, et al. Color measurement of flax retted by various means[J]. Textile Re- search Journal,2000,70(10) :852-858.

引证文献2

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部