期刊文献+

一种基于Ajax的爬虫模型的设计与实现 被引量:3

DESIGN AND IMPLEMENTATION OF A CRAWLER MODEL BASED ON AJAX
下载PDF
导出
摘要 Ajax技术给传统的爬虫带来了巨大的挑战,为了能够提取出在Ajax程序中隐藏的WEB资源,需要解决两大关键问题:JavaScript脚本解析和每一次脚本事件触发后的页面状态信息的保存。提出一种适合Ajax的爬虫模型,在模型中设计了一个嵌入的脚本解析引擎和使用了有向状态图,有效地解决了上述的两个关键问题。实验结果显示该爬虫模型能够有效地提取出Ajax程序中的隐藏资源。 Ajax technology brings to a great challenge on traditional crawler.In order to find the Hidden Web resource made by Ajax,two key problems have to be resolved:JavaScript parsing and storing every page's state information after every JavaScript's event triggered.The paper presents a crawler model suitable for Ajax,in which an embedded script-parsing engine and a directed state graphics have been designed;it has effectively settled above two problems.The experimental result shows that the crawl model could acquire the Hidden Web resource in the Ajax application effectively.
出处 《计算机应用与软件》 CSCD 2010年第1期96-99,共4页 Computer Applications and Software
基金 浙江省自然科学基金(Y106176)
关键词 爬虫 脚本解析引擎 有向状态图 Crawler Script-parsing engine Directed status graphics
  • 相关文献

参考文献8

  • 1Raghavan S, Garcia-Molina H. Crawling the hidden web [ C ]//Roma, Italy:Prec. of the 27th International Conference on Very Large Data- Bases(VLDB) ,2001:129 - 139.
  • 2Barbosa L, Freire J. Anadaptive crawler for locating hidden-web entry points [ C ]//Alberta, Canada: Proc. of the 16th international conference on World Wide Web,2007:441 -450.
  • 3Ntoulas A, Zerfos P, Cho J. Downloading textual hidden web content through key word queries [ C ]//North California, USA : Proc. of the 5th ACM/IEEE-CS joint conference on Digital libraries,2005:100- 109.
  • 4Alvarez M, Raposo J, Pan A, et al. Crawling the Content Hidden Behind Web Forms [ J]. Lecture Notes in Computer Science,2007,4702:322 -333.
  • 5Alvarez M, Pan A, Raposo J, et al. Crawling Web Pages with Support for Client-Side Dynamism[ C ]//HongKong, China: Proc. of the 7th International Conference on Web Age Information Management (WAIM06). 2006 : 252 - 262.
  • 6王映,于满泉,李盛韬,王斌,余智华.JavaScript引擎在动态网页采集技术中的应用[J].计算机应用,2004,24(2):33-36. 被引量:36
  • 7Moailla. Tutorial: Embedding Rhino [ EB/OL]. 2006 - 11 - 14. http ://www. mozilla, org/rhino/tutorial, html.
  • 8Mozilla. Rhino documentation [ EB/OL]. 2008 - 4 - 14. http ://developer. rnozilla, org/en/docs/Rhino documentation.

二级参考文献4

  • 1[1]Eich B. JavaScript C Engine Embedder's Guide[EB/OL]. Http://www.mozilla.org/js/spidermonkey/apidoc/jsguide.html, mozilla.org, march 16, 2000.
  • 2[2]ECMA. ECMA-Script Language Specification Edition 3[EB/OL]. Http://www.mozilla.org/js/language/E262 3.pdf, European Computer manufacturer Association, march 24, 2000.
  • 3[3]Netscape. JavaScript C Engine API Reference[EB/OL]. http://developer.netscape.com/docs/manuals/javascriptapi/index.htm, Netscape Communications Corp., December 17, 1998.
  • 4[4]Netscape. JavaScript 1.5 References[EB/OL]. http://devedge.netscape.com/library/manuals/2000/javascript/1.5/guide/, Netscape Communications Corp., September 28, 2000.

共引文献35

同被引文献15

  • 1彭轲,廖闻剑.基于浏览器服务的网络爬虫[J].硅谷,2009,2(4). 被引量:7
  • 2Wikipedia. Web crawler [EB/OL]. [2013-05-30]. http ://en.wiki- pedia.org/wiki/Web_crawler.
  • 3University of Toronto. HTML and XHTML document type defi- nitions [EB/OL]. [2013-04-23]. http://www.utoronto.ca/webdocs/ HTMLdocs/HTML_Spec/html.html.
  • 4Wikipedia. Regular expression [EB/OL]. [2013-04-23]. http://en. wikipedia.org/wiki/Regular_expression.
  • 5World Wide Web Consortium. Document object model [EB/OL]. [2013-04-23]. http ://www.w3.org/DOM.
  • 6ALVAREZ M, RAPOSO J, PAN A, et al. DeepBot: a focused crawler for accessing hidden web content [C]// Proceedings of DEECS 2007. New York, USA: ACM, 2007: 18-25.
  • 7I WebKit Open Source Project. The WebKit open source project [EB/OL]. [2013-03-24]. http://www.webkit.org.
  • 8WebKit Open Source Project. JavaScriptCore [EB/OL]. [2013- 03-24]. http ://trac.webkit.org/wiki/JavaScriptCore.
  • 9w3af. w3af-open source web application security scanner [EB/ OL]. [2013-04-16]. http://www.w3af.org.
  • 10GIRARDI C, RICCA F, TONELLA P. Web crawlers com- pared [J]. International Journal of Web Information Systems, 2006, 2(2): 85-94.

引证文献3

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部