期刊文献+

基于关键词相关度的Deep Web爬虫爬行策略 被引量:7

Crawlers Crawling Strategy of Deep Web Based on Keywords Relevant Weight
下载PDF
导出
摘要 Deep Web蕴藏丰富的、高质量的信息资源,为了获取某Deep Web站点的页面,用户不得不键入一系列的关键词集。由于没有直接指向Deep Web页面的静态链接,目前大多数搜索引擎不能发现这些页面。该文提出的Deep Web爬虫爬行策略,可以有效地下载Deep Web页面。由于该页面只提供一个查询接口,因此Deep Web爬虫设计面对的主要挑战是怎样选择最佳的查询关键词产生有意义的查询。实验证明文中提出的一种基于不同关键词相关度权重的选择方法是有效的。 There is plenty high-quality information in Deep Web, but user has to input several keywords to search and reach the pages of Deep Web. Traditional crawlers cannot get to the Hidden Web pages because there are no direct links to pages of Deep Web. This paper presents a crawling strategy that can download the pages of Deep Web effectively. As the result of the only interface that Deep Web provides, the biggest challenge for Deep Web crawler is how to choose the best keywords to query effectively. This paper brings forward a new selecting method that based on the relevant weight of different keywords. The experiment shows that this method is efficient.
作者 田野 丁岳伟
出处 《计算机工程》 CAS CSCD 北大核心 2008年第15期220-222,共3页 Computer Engineering
关键词 Deep WEB页面 爬行策略 关键词选择 相关度权重 覆盖率 Deep Web crawling strategy keywords selection relevant weight covering rate
  • 相关文献

参考文献8

  • 1Bergman M K. The Deep Web: Surfacing Hidden Value[EB/OL]. (2001-07-01). http://www.press.umich.edu/j ep/07-01/bergman.html.
  • 2Chang K C C, He B, Li C, et al. Structured Databases on the Web: Observations and Implications[J]. SIGMOD Record, 2004, 33(3): 61-70.
  • 3He Bin, Patel M, Zhen Zhang, et al. Accessing the Deep Web: A Survey[EB/OL]. (2004-10-18). http://eagle.cs.uiuc.edu/tr/dwsurveytr-hpzc-ju 104.pdf.
  • 4Arasu A, Garcia-Molina H. Extracting Structured Data from Web Pages[C]//Proc. of the ACM SIGMOD International Conference on Management of Data. San Diego, California, USA: ACM Press, 2003: 337-348.
  • 5He H, Meng W, Yu C, et al. Wise-Integrator: An Automatic Integrator of Web Search Interfaces for E-commerce[C]//Proc. of the 29th Int'l Conf. on Very Large Data Bases. San Fransisco, USA: Morgan Kaufmann Publishers, 2003: 357-368.
  • 6Cormen T H, Leiserson C E, Rivest R L. Introduction to Algorithms[M]. 2nd ed. [S. l.]: MIT Press/McGraw Hill 2001.
  • 7Cope J, Craswell N, Hawking D. Automated Discovery of Search Interfaces on the Web[C]//Proc. of the 14th Australasian Conference on Database Technologies. 2003.
  • 8Chang K C C, He B, Zhang Z. Toward Large Scale Integration: Building a MetaQuerier over Databases on the Web[C]//Proc. of the 2nd Conference on Innovative Data Systems Research. Asilomar, California, USA: [s. n.], 2005.

同被引文献58

引证文献7

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部