摘要
针对传统的Web站点结构恢复方法的局限性,提出了一种基于改进的网络蜘蛛算法的Web结构抽取方法,并实现了相应的工具WebAnalyzer。该方法通过深度优先搜索策略递归遍历Web站点,同时对网页的语法、标签进行分析,提取词法信息,在此基础上形成Web结构视图和词法表。实验表明,该方法能够快速准确地恢复Web站点结构图。
Due to the limitation of the traditional structure extraction based on the improved spider method, the paper presents a method of the web algorithm, and gives the implementation process of the tool named WebAnalyzer. This method visits the whole website in-depth first way, analyzes the tag of the HTML files and the syntax of the JavaScript, and pick-ups the lexical information. Based on this, the web structural view and lexieal table are formed. The experiment result shows that the methods can recover the Web structure quickly and exactly
出处
《江南大学学报(自然科学版)》
CAS
2009年第5期555-559,共5页
Joural of Jiangnan University (Natural Science Edition)
关键词
WEB应用
逆向工程
静态分析
结构抽取
Web applications, reverse engineering, static analysis, structure extraction