期刊文献+

主题爬虫的关键技术

Topic-Focused Crawling Technology
下载PDF
导出
摘要 随着Internet的快速发展,越来越多的用户提出与主题或者领域相关的查询需求,而传统通用搜索引擎已经无法满足这一需求。为了克服传统通用搜索引擎的不足,研究者提出面向主题的爬虫。首先给出主题网络爬虫的定义,接着提出主题爬虫的三个关键技术:抓取目标、网页搜索策略和网页主题相关性算法,最后给出主题爬虫在今后的一些研究方向。 With the high development of the Internet, the survey of topic-focused crawling starts to meet the new demands of people. And below is a basic introduction on concepts of topic-focused crawling. Lists some key technologies in topic-focused crawling, such as the searching strategy and the webpage analyzing algorithm. And finally indicates some future works for topic-focused crawling research.
作者 赵强
出处 《现代计算机》 2014年第2期19-22,共4页 Modern Computer
关键词 搜索引擎 主题爬虫 网页分析 搜索策略 Search Engine Topic-Focused Crawler Webpage Analysis Searching Strategy
  • 相关文献

参考文献2

二级参考文献23

  • 1Chang KCC, Cho J. Accessing the Web: From search to integration. In: Proc. of 2006 ACM SIGMOD Int'l Conf. on Management of Data (SIGMOD 2006). Chicago: ACM Press, 2006. 804-805.
  • 2Cope J, Craswell N, Hawking D. Automated discovery of search interfaces on the Web. In: Proc. of the 14th Australasian Database Conf. (ADC 2003). Adelaide: Australian Computer Society Press, 2003. 181-189.
  • 3Kabra G, Li C, Chang KCC. Query routing: Finding ways in the maze of the deep Web. In: Proc. of the Int'l Workshop on Challenges in Web Information Retrieval and Integration (WIR12005). Tokyo: IEEE Computer Society Press, 2005. 64-73.
  • 4He H, Meng W, Yu CT, Wu Z. WISE-Integrator: An automatic integrator of Web search interfaces for e-commerce. In: Proc. of the 29th Int'l Conf. on Very Large Data Bases (VLDB 2003). Berlin: ACM Press, 2003.357-368.
  • 5Wu W, Doan A, Yu CT. WebIQ: Learning from the Web to match deep-Web query interfaces. In: Proc. of the 22rid Int'l Conf. on Data Engineering (ICDE 2006). Atlanta: IEEE Computer Society Press, 2006.44.
  • 6Zhai Y, Liu B. Web data extraction based on partial tree alignment. In: Proc. of the 14th Int'l World Wide Web Conf. (WWW 2005). Chiba: ACM Press, 2005.76-85.
  • 7Zhao H, Meng W, Wu Z, Raghavan V, Yu CT. Fully automatic wrapper generation for search engines. In: Proc. of the 14th Int'l World Wide Web Conf. (WWW 2005). Chiba: ACM Press, 2005, 66-75.
  • 8Raghavan S, Garcia-Molina H, Crawling the hidden Web. In: Proc. of the 27th Int'l Conf. on Very Large Data Bases (VLDB 2001). Rome: ACM Press, 2001. 129-138.
  • 9Wu P, Wen JR, Liu H, Ma WY. Query selection techniques for efficient crawling of structured Web sources. In: Proc. of the 22nd Int'l Conf, on Data Engineering (ICDE 2006). Atlanta: IEEE Computer Society Press, 2006.47-58.
  • 10BrightPlanet.com. The deep Web: Surfacing hidden value. 2000. http://brightplanet.com

共引文献56

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部