摘要
在Web挖掘分类模式基础上,研究和分析了基于链接分析的Web结构挖掘算法HITS(Hyperlink induced topic Search)。针对HITS算法在获取拓展集处理过程中只考虑基于根集网页链接出、入网页,不考虑出、入网页相似度的不足之处,提出了一种改进的DS-HITS(Document Similarity hyperlink induced topic search)算法。该算法在拓展集处理过程中引进多种反映网页相似度的权值,从而使获取的网页在核心和权威值方面明显得到改进。最后,基于Webla开源项目初始数据,对比了DS-HITS算法和HITS算法的搜索结果。
On the basis of Web mining classification pattern,a Web structure mining algorithm HITS based on linked-analysis is studied and analyzed in this paper.An improved DS-HITS algorithm is proposed in light of the shortcomings of HITS Algorithm which only considers the linked into and out of web pages based on root sets but does not consider the similarities of linked into and out of web pages in the acquiring course of expanded sets processing.Many kinds of weights reflecting the pages'similarities are introduced in this improved algorithm in the course of expanded sets processing,so that the core values and authorities of the acquired pages are to be improved significantly.Finally,the searching results of DS-HITS and HITS algorithm are compared based on the initial data of Webla's open source project.
出处
《计算机应用与软件》
CSCD
2011年第1期272-273,301,共3页
Computer Applications and Software
关键词
WEB挖掘
HITS算法
DS-HITS算法
Web mining HITS(Hyperlink induced topic search) algorithm DS-HITS(Document similarity hyperlink induced topic search) algorithm