摘要
随着互联网的迅速发展,网络资源日益丰富,如何从Web尤其是Deep Web中获取信息成为人们关注的焦点,以Ajax为基础的新一代网页信息抓取问题也逐渐成为研究热点。通过分析支持Ajax的Deep Web爬虫关键技术,提出了支持Ajax的Deep Web爬虫的体系结构,阐述了一种自动爬行Ajax网站的算法,为该爬虫的总体框架设计奠定了基础。
With the rapid development of Intemet, the network resources are getting more and more abundant, how to extract information from network, especially from Deep Web his been focused on. A new generation of Ajax-based web information extraction has become a hot topic. By analyzing the key technology of the Ajax-supported Deep Web Crawler, this paper puts forward the architecture of the Ajax-Supported Deep Web Crawler, and illustrates an algorithm to crawl the Ajax-supported Deep Web automatically, which lay the foundation for the design of the overall framework of an Ajax-supported Deep Web Crawler.
出处
《计算机系统应用》
2012年第2期167-171,共5页
Computer Systems & Applications