期刊文献+

链路结构的网页聚类研究 被引量:1

Research of Web Pages Clustering based on Link Structure
下载PDF
导出
摘要 网页的链接关系反映了网页之间联系的紧密程度,这种紧密关系是网页聚类的重要依据.首先通过对网页链路结构的特点分析,提出网页节点的基本集、扩展集、半径、邻域、密度和路径树等概念;然后,利用共享入度出度以及网页之间的相异度来衡量其距离,并结合扩展集中的链接信息设计了网页相似度的计算模型;最后,利用密度分布对网页进行聚类.实验结果表明,本算法具有较好的聚类效果. The relationships of links between Web pages reflect the close degree of the connection between web pages which is the basis of web page clustering. At first,this paper analyzes the characteristics of the structure of web page link,and puts forward the concept of basic web node set,extension set,radius,neighborhood,density and path relationship graph; Then,using the shared in-degrees,out-degrees and dissimilarity to measure distance between pages,combining to the similarity relationship of extension set,design the calculation formula of web distance based on link analysis; Finally,cluster web pages from the perspective of density distribution. The experimental results showthat the algorithm has good clustering effect.
作者 刘勘 范琴
出处 《小型微型计算机系统》 CSCD 北大核心 2016年第7期1450-1454,共5页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(71203164)资助 国家社会科学基金项目(14BXW033)资助
关键词 WEB挖掘 链接分析 网页聚类 Web mining link analysis Web page clustering
  • 相关文献

参考文献4

二级参考文献46

  • 1刘远超,王晓龙,徐志明,关毅.文档聚类综述[J].中文信息学报,2006,20(3):55-62. 被引量:65
  • 2彭京,杨冬青,唐世渭,付艳,蒋汉奎.一种基于语义内积空间模型的文本聚类算法[J].计算机学报,2007,30(8):1354-1363. 被引量:44
  • 3Zeng H J, He Q C, Chen Z, etal. Learning to cluster web search Results. Proceedings of the 27^th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, 2004.- 210-217.
  • 4Zhang D, Dong Y S. Semantic, hierarchical, online clustering of web search results. Proceedings of the Advanced Web Technologies and Applications, the 6^th Asia-Pacific Web Conference, 2004, 3007: 69-78.
  • 5Cutting D, Karger D, Pedersen J, et al. Scatter/Gather: A cluster-based approach to browsing large document collections. Proceedings of the 15^th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, 1992, 318-392.
  • 6Zamir O, Etzioni O. Grouper: A dynamic clustering interface to web search results. Computer Networks, 1999, 31(11-16) : 1361-1374.
  • 7Weiss D, Osinski S. Carrot^2 open source framework for building search clustering engines. http://project.carrot2. org/. 2008-03.
  • 8Osinski S, Stefanowski J, Weiss D. Lingo: Search results clustering algorithm based on singular value decomposition. Proceedings of the International Conference on Intelligent Information Systems (IIPWM), 2004, 359-368.
  • 9Giacomo E, Didimo D, Grilli L, et al. Graph visualization techniques for web clustering engines. IEEE Transactions on Visualization and Computer Graphics, 2007, 13(2): 294-304.
  • 10Gulli A. Personalized sankeT, http://snaket. di. unipi. it/. 2005-06.

共引文献10

同被引文献4

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部