期刊文献+

基于改进的网络蜘蛛算法抽取Web站点结构的方法 被引量:5

Method of the Web Structure Recovery Based on the Improved Spider Algorithm
下载PDF
导出
摘要 针对传统的Web站点结构恢复方法的局限性,提出了一种基于改进的网络蜘蛛算法的Web结构抽取方法,并实现了相应的工具WebAnalyzer。该方法通过深度优先搜索策略递归遍历Web站点,同时对网页的语法、标签进行分析,提取词法信息,在此基础上形成Web结构视图和词法表。实验表明,该方法能够快速准确地恢复Web站点结构图。 Due to the limitation of the traditional structure extraction based on the improved spider method, the paper presents a method of the web algorithm, and gives the implementation process of the tool named WebAnalyzer. This method visits the whole website in-depth first way, analyzes the tag of the HTML files and the syntax of the JavaScript, and pick-ups the lexical information. Based on this, the web structural view and lexieal table are formed. The experiment result shows that the methods can recover the Web structure quickly and exactly
出处 《江南大学学报(自然科学版)》 CAS 2009年第5期555-559,共5页 Joural of Jiangnan University (Natural Science Edition) 
关键词 WEB应用 逆向工程 静态分析 结构抽取 Web applications, reverse engineering, static analysis, structure extraction
  • 相关文献

参考文献7

二级参考文献108

  • 1林海霞,原福永,陈金森.主题网络蜘蛛搜索策略贪婪性解决方法[J].微电子学与计算机,2006,23(z1):278-280. 被引量:4
  • 2吴丽辉,王斌,余智华.一个基于Web的信息获取系统的框架与实现[J].微电子学与计算机,2004,21(10):121-123. 被引量:2
  • 3周立柱,林玲.聚焦爬虫技术研究综述[J].计算机应用,2005,25(9):1965-1969. 被引量:153
  • 4[20]Diligenti M,Coetzee F M,Lawrence S et al.Focused crawling using context graphs[C].In:Proc of the International Conference on Very Large Database ( VLDB ′00 ), 2000: 527~534
  • 5[21]Sutton R S,Barto A G.Reinforeement learning:an introduction[M].MA:MIT Press, 199822.Pant G,Srinivasan P,Menczer F.Exploration versus exploitation in topic driven crawler[C].In:Proc of The WWW-02 Workshop on Web Dynamics, 2002
  • 6[17]Bharat K Henznger.lmproved algorithms for topic distillation in a hyperlinked environment[C].In:Proc of SIGIR Conference on Research and Development in Information Retrieval,1998
  • 7[18]Dean J,Henzinger. Finding related pages in the World Wide Web [J].Computer Networks, 1999; 31 ( 11 ~ 16): 1467~1479
  • 8[19]Davison B.Topical locality in the web[C].In:Proc of the 23th Annual International Conference Information Retrieval,Athens,2000:272~279
  • 9[1]Murray B H,Moore A.Sizing the Intemet[M].A White Paper:Cyveillance, Inc, 2000
  • 10[2]Lawrence S ,Giles L.Accessibility and distribution of information on the Web[J].Nature, 1999 ;400(8): 107~109

共引文献292

同被引文献31

引证文献5

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部