摘要
网页的链接关系反映了网页之间联系的紧密程度,这种紧密关系是网页聚类的重要依据.首先通过对网页链路结构的特点分析,提出网页节点的基本集、扩展集、半径、邻域、密度和路径树等概念;然后,利用共享入度出度以及网页之间的相异度来衡量其距离,并结合扩展集中的链接信息设计了网页相似度的计算模型;最后,利用密度分布对网页进行聚类.实验结果表明,本算法具有较好的聚类效果.
The relationships of links between Web pages reflect the close degree of the connection between web pages which is the basis of web page clustering. At first,this paper analyzes the characteristics of the structure of web page link,and puts forward the concept of basic web node set,extension set,radius,neighborhood,density and path relationship graph; Then,using the shared in-degrees,out-degrees and dissimilarity to measure distance between pages,combining to the similarity relationship of extension set,design the calculation formula of web distance based on link analysis; Finally,cluster web pages from the perspective of density distribution. The experimental results showthat the algorithm has good clustering effect.
出处
《小型微型计算机系统》
CSCD
北大核心
2016年第7期1450-1454,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(71203164)资助
国家社会科学基金项目(14BXW033)资助