期刊文献+

一种基于群体智能的Web文档聚类算法 被引量:41

A CLUSTERING ALGORITHM BASED ON SWARM INTELLIGENCE FOR WEB DOCUMENT
下载PDF
导出
摘要 将群体智能聚类模型运用于文档聚类 ,提出了一种基于群体智能的 Web文档聚类算法 .首先运用向量空间模型表示 Web文档信息 ,采用常规方法如消除无用词和特征词条约简法则得到文本特征集 ,然后将文档向量随机分布到一个平面上 ,运用基于群体智能的聚类方法进行文档聚类 ,最后从平面上采用递归算法收集聚类结果 .为了改善算法的实用性 ,将原算法与 k均值算法相结合提出一种混合聚类算法 .通过实验比较 ,结果表明基于群体智能的 Web文档聚类算法具有较好的聚类特性 ,它能将与一个主题相关的 Web文档较完全而准确地聚成一类 . Swarm intelligence due to its flexibility, robustness and self-organization has been applied in a variety of areas. A clustering algorithm based on swarm intelligence (CSI) for web documents is proposed. Firstly, web documents, which are denoted by vector space model with reduced document feature set, are randomly projected on a plane. Then, clustering analysis is conducted by a clustering method derived from a basic model interpreting ant colony organization of cemeteries. The artificial ants perform random walks on the plane and pick up or drop projected data items with the probability which is converted from swarm similarity within a local region by probability conversion function. Clusters are visually formed on the plane by ant colony collective actions in the absence of central controls. Finally, the clustering results are collected from the plane by a recursive algorithm. Each clustering center is labeled by the most weighted feature. A hybrid clustering algorithm CSIM is also proposed by combining the CSI with the k -means algorithm. CSIM inherits the prominent properties of both swarm intelligence and k -means. It also offsets the weakness of those two techniques. The experiment results and the comparison with other document clustering methods show that this web document clustering algorithm based on swarm intelligence has good clustering performance. The web documents focusing on a subject are rather completely and exactly clustering together.
出处 《计算机研究与发展》 EI CSCD 北大核心 2002年第11期1429-1435,共7页 Journal of Computer Research and Development
基金 国家自然科学基金项目 ( 6 0 0 730 19 90 10 40 2 1) 北京市自然科学基金重点项目 ( 4 0 110 0 3)资助
关键词 群体智能 WEB 文档聚类算法 自组织聚类 群体相似度 互联网 信息检索 swarm intelligence, document clustering, self-organizing clustering, swarm similarity
  • 引文网络
  • 相关文献

参考文献14

  • 1T Kohonen. Solf Organizing Maps, 3rd ed. Berlin: Springer,2001
  • 2Jianwei Han, M Kamber. Data Mining: Concepts and Techniques. San Francisco, CA: Morgan Kaufmann Publishers, 2001
  • 3H Chen, C Schuffels, R Orwig. Internet categorization and search: A self organizing approach. Journal of Visual Communication and Image Representation, 1996, 7 ( 1 ): 88 ~102
  • 4P Willet. Recent trends in hierarchical document clustering: A critical review. Information Processing and Management,1988, 24(5): 577~587
  • 5王爱华,张铭,杨冬青,唐世渭.PCCS部分聚类分类:一种快速的Web文档聚类方法[J].计算机研究与发展,2001,38(4):415-421. 被引量:23
  • 6O Zamir, O Etzioni. Web document clustering: A feasibility demonstration. The 21st Annual Int'l ACM SIGIR Conf on Research and Development in Information Retrieval,Melbourne, Australia, 1998
  • 7M Dorigo, E Bonabeau, G Theraulaz. Ant algorithms and Stigmergy. Future Generation Computer Systems, 2000, 16 (8): 851~871
  • 8T Stutzle, H Hoos. MAX MIN ant system. Future Generation Computer System, 2000, 16(8): 889~914
  • 9吴斌,史忠植.一种基于蚁群算法的TSP问题分段求解算法[J].计算机学报,2001,24(12):1328-1333. 被引量:247
  • 10E Bonabeau, M Dorigo, G Theraulaz. Inspiration for optimization from social insect behaviour. Nature, 2000, 406..39~42

二级参考文献7

  • 1康立山 谢云 等.非数值并行算法(第1册)[M].北京:科学出版社,1997..
  • 2Yang Yiming,Proc ACMSIGIR Conf Research Development Information Retrieval(SIGIR),1999年,42页
  • 3Jiang Rui,Proc Conference on Intelligent Information Processing(WCC 2000 IIP 2000),2000年,478页
  • 4Wu Qinghong,计算机研究与发展,1999年,36卷,10期,1240页
  • 5康立山,非数值并行算法.1 模拟退火算法,1997年
  • 6吴庆洪,张纪会,徐心和.具有变异特征的蚁群算法[J].计算机研究与发展,1999,36(10):1240-1245. 被引量:307
  • 7张素兵,吕国英,刘泽民,周正.基于蚂蚁算法的QoS路由调度方法[J].电路与系统学报,2000,5(1):1-5. 被引量:35

共引文献268

同被引文献519

引证文献41

二级引证文献282

;
使用帮助 返回顶部