摘要
传统的搜索引擎不能代替用户实行实时监控,为了解决这个问题,提出了定向搜索监控技术,用户可以根据自己的需求定制任务,包括指定搜索范围和搜索主题,系统按用户定义周期监控,并将结果及时主动地反馈给用户。以Google云平台Google App Engine作为开发平台,利用其提供的多项云服务,有效地解决了计划任务管理、多任务触发以及高并发等问题。重写了通用网络爬虫,通过算法改进提出了定向网络爬虫模型,定向网络爬虫与云端强大的服务器相结合,极大地缩短了爬行时间,提高了搜索监控效率。云平台和搜索监控技术的结合是平台即服务思想的一次成功实验。
Traditional search engines cannot replaces users to support real-time monitoring. To solve this problem, this paper proposes the initiative directed searching and monitoring technology. Users can customize their own tasks, including search websites and search theme. The system monitors at the us- er-defined period, and the results are returned to the user immediately. The Google App Engine (GAE) is used as the development platform, its several cloud computing services are used to solve the problems such as the planned task management, multitasking and high concurrency. We rewrite the web crawler and propose the directed web crawler. Combining the directed crawler and the cloud server, the crawling time is shorten and the monitoring efficiency is increased. It is a successful experiment on Platform as a Service (PaaS) that combining the cloud platform and the searching and monitoring technology.
出处
《计算机工程与科学》
CSCD
北大核心
2013年第1期82-87,共6页
Computer Engineering & Science