摘要
Ajax技术给传统的爬虫带来了巨大的挑战,为了能够提取出在Ajax程序中隐藏的WEB资源,需要解决两大关键问题:JavaScript脚本解析和每一次脚本事件触发后的页面状态信息的保存。提出一种适合Ajax的爬虫模型,在模型中设计了一个嵌入的脚本解析引擎和使用了有向状态图,有效地解决了上述的两个关键问题。实验结果显示该爬虫模型能够有效地提取出Ajax程序中的隐藏资源。
Ajax technology brings to a great challenge on traditional crawler.In order to find the Hidden Web resource made by Ajax,two key problems have to be resolved:JavaScript parsing and storing every page's state information after every JavaScript's event triggered.The paper presents a crawler model suitable for Ajax,in which an embedded script-parsing engine and a directed state graphics have been designed;it has effectively settled above two problems.The experimental result shows that the crawl model could acquire the Hidden Web resource in the Ajax application effectively.
出处
《计算机应用与软件》
CSCD
2010年第1期96-99,共4页
Computer Applications and Software
基金
浙江省自然科学基金(Y106176)
关键词
爬虫
脚本解析引擎
有向状态图
Crawler Script-parsing engine Directed status graphics