期刊文献+

基于动态隧道算法的网络爬行器设计与实现

Web Crawler's Design and Implementation Based on Dynamic Tunneling
下载PDF
导出
摘要 在分析传统网络爬行器爬行算法的基础上,通过将隧道算法和网页页面分块技术相结合,指导专题爬行器进行爬行。通过4所高校门户网站的教育资源搜索实验表明,新的算法可以有效提高搜索效率。 Based on analysis of the traditional Web Crawlers' searching mechanics, this paper combines the tunneling and Web page division with Web Crawler' s searching strategy. Then a dynamic tunneling Web Crawler' s searching algorithm is proposed. Experiments in four university Websites are carried out in allusion to "education resources", and resuits show that the new algorithm outperforms two standard crawlers for focused crawling.
出处 《现代图书情报技术》 CSSCI 北大核心 2008年第6期83-87,共5页 New Technology of Library and Information Service
基金 湖北省教育厅教学研究项目"多层次计算机网络实验教学改革与实践"(项目编号:20070229)的研究成果之一
关键词 爬行器 隧道穿越 网页分块 Web crawlers Tunneling Web page division
  • 相关文献

参考文献9

  • 1Bermark D , Lagoze C, Sbiltyakov A. Focused Crawls, Tunneling,and Digital Libraries[ C ]. In: Proceedings of the 6th European Conferrence on Research Advanced Technology for Digital Libraries, Lecture Notes In Computer Science,2002,2458:91 - 106.
  • 2Luo N,Zuo W L,Yuan F Y, Gray Tunneling Based on Block Relevance for Focused Crawling[ EB/OL]. [ 2007 - 12 - 30 ]. http :// www. atlantis - press. com/php/download_paper? id = 1288.
  • 3封化民,刘飚,刘艳敏,方勇,宋国森.含有位置坐标树的Web页面分析和内容提取框架[J].清华大学学报(自然科学版),2005,45(S1):1767-1771. 被引量:8
  • 4Lin S H, Ho J M. Discovering Informative Content Blocks from Web Documents [ C ]. In : Proceedings of the ACM SIGKDD Int. 2002. New York : ACM Press, 2002:588 - 593.
  • 5Kovacevic M, Diligenti M, Gori M, et al. Recognition of Common Area in a Web Page Using Visual Information: A Possible Application in a Page Classification[ C ]. In: Proceeding of the lOth international Conference on Artifical Intelligence : Methodology, Systems, Application. Varna:Springer,2002:203-212.
  • 6荆涛,左万利.基于可视布局信息的网页噪音去除算法[J].华南理工大学学报(自然科学版),2004,32(z1):84-87. 被引量:21
  • 7王知津,贾福新,郑红军.现代信息检索[M].北京:机械工业出版社,2006.
  • 8Srinivasan P, Menczer F, Pant G. A General Evaluation Framework for Topical Crawlers [ J ]. Information Retrieval, 2005,8 ( 3 ) :417 - 447.
  • 9教育信息化技术标准委员会.CELTS-31:教育资源建设技术规范[EB/OL].[2005-12-20].http://www.edu.cn/html/key-anfz/doc20020210/13.doc.

二级参考文献11

  • 1[1]Lin Shian-hua, Ho Jan-ming. Discovering informative content blocks from Web documents [A]. Proceeding of the 8th ACM SIG KDD International Conference on Knowledge Discovery and Data Mining [C]. Edmonton :ACM Press,2002.588 - 593.
  • 2[2]Yi Lan,Liu Bing, Li Xiao-li. Eliminating noisy information in Web pages for data mining [A]. Proceeding of the 8th ACM SIG KDD International Conference on Knowledge Discovery and Data Mining [C]. Washington, DC: ACM Press ,2003. 296 - 305.
  • 3[3]Kovacevic Milos, Dilligenti Michelangelo, Gori Marco,et al. Recognition of common areas in a Web page using a visualization approach [A]. Proceeding of the 10th International Conference on Artificial Intelligence: Methodology, Systems, Applications [C]. Varna: Springer,2002.203 - 212.
  • 4[4]Gupta Suhit, Kaiser Gail E, Neistadt David. et al. DOMbased content extraction of HTML documents [A].Proce-eding of the 12th International World Wide Web Conference [C]. Budapest: ACM Press ,2003. 207 - 214.
  • 5[5]Cai Deng, Yu Shi-peng, Wen Ji-rong, et al. Extracting content structure for Web pages Based on visual representation [A]. Proceeding of the 6th Asia Pacific Web Conference [C]. Xian: Springer,2003. 406 - 417.
  • 6Finn A,Kushmerick N,Smyth B.Fact or fiction: Content classification for digital librarie[].Joint DELOS-NSF Workshop on Personalisation and Recommender Systems in Digital Libraries.2001
  • 7Kovacevic M.Recognition of common areas in web page using visual information: A possible application in a page classification[].Proceedings of ICDM.2002
  • 8Gupta S,Kaiser G,Neistadt D,et al.DOM based content extraction of HTML documents[].Proc of the th World Wide Web Conference (WWW ).2003
  • 9YI Lan,LIU Bing.Web page cleaning for web mining through feature weighting[].Proceedings of Eighteenth International Joint Conference on Artificial Intelligence(IJCAI - ).2003
  • 10Lin S-H,Ho J-M.Discovering informative content blocks from web documents[].Proceedings of the ACM SIGKDD Int Conf on Knowledge Discovery & Data Mining(SIGKDD’ ).2002

共引文献24

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部