期刊文献+

Web结构挖掘中基于熵的链接分析法 被引量:1

Entropy-based link analysis algorithm for web structure mining
下载PDF
导出
摘要 在Web结构挖掘中,传统的HITS(hyperlink induced topics search)算法被广泛应用来寻找搜索引擎返回页面中的Autho-rity页面和Hub页面。但是在网站中除了有价值的页面内容外,还有很多与页面内容无关的链接,如广告、链接导航等。由于这些链接的存在,应用HITS算法时就会导致某些广告网页或无关网页获得较高的Authority值和Hub值。为了解决这个问题,在原有HITS算法的基础上,引入了香农信息熵的概念,提出了基于熵的网页链接分析方法来挖掘网页结构。该算法的核心思想是用信息熵来表示链接文本所隐含的知识。 In Web structure mining, hyperlink induced topics search (HITS) algorithm has been widely employed to analyze authorities and hubs of pages returned by search engine. However, except for useful information, most of content sites contain some irrelevant hyperlinks, such as advertisements and navigation panels. And because of these extra hyperlinks, HITS is found insufficient in analyzing advertisement or irrelevant pages, which would result in high authority values or hub values for these pages. In order to solve this problem, Shannon information entropy is introduced to HITS algorithm, thus the entropy-based link analysis algorithm is presented to mine Web informative structures. The key idea of this algorithm is to utilize shannon information entropy to represent the knowledge hided in link texts.
出处 《计算机工程与设计》 CSCD 北大核心 2006年第9期1622-1624,1688,共4页 Computer Engineering and Design
关键词 主题提取 链接分析 WEB结构挖掘 topic distillation entropy link analysis web structure mining
  • 相关文献

参考文献7

  • 1Kleinberg J M.Authoritative sources in a hyperlinkede environment[J].ACM-SIAM Symposium on Discrete Algorithms,1998,32(8):60-67.
  • 2Davison B D.Recognizing nepotistic links on the web[J].Proc of AAAI,2000,22(6):72-77.
  • 3Jushmerick N.Learning to remove internet advertisements[J].Proc of3rd International Conf on Autonomous Agents,1999,10(2):209-221.
  • 4Hsu C N,Dung M T.Generating finite-state transducers for semistructured data extraction from the web[J].Information Systems,1998,23(8):521-538.
  • 5Bharat K,Henzinger M R.Improved algorithms for topic distillation in a hyperlinked environment[J].Proc of 21 th ACM SIGIR Conf on Research and Development in Information Retrieval,1998,12(3):353-372.
  • 6Chakrabarti S,Joshi M,Tawde V.Enhanced topic distillation using text,markup tags,and hyperlinks[J].Proc of 24th ACM SIGIR Conf on Research and Development in Information Retrieval,2001,22(4):235-241.
  • 7Chakrabarti S.Integrating the document object model with hyperlinks for enhanced topic distillation and information extraction[J].Proc of 10th World Wide Web Conference,2001,6 (7):130-136.

同被引文献54

引证文献1

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部