摘要
在移动学习项目的开发过程中,结合我国教育资源利用率低的问题,通过扩展Heritrix和Lucene,整合教育资源,设计并实现了面向教育视频资源的垂直搜索引擎。针对Heritrix与Lucene串行组合方案难以实现信息抓取、分析过程与索引过程同时进行的问题,提出一种紧耦合的流程优化组合方案,使网页抓取、网页内容分析筛选和建立索引同时进行,降低了系统IO开销和磁盘空间的占用率。实验测试表明,在Heritrix运行过程中嵌入索引建立操作,对系统的运行效率影响较小,满足实际应用的需要。
This paper combines with the question of the lower utilization rate of education resources in our country during the development of M-Learning project, and then integrates education video resources, designs and implements a vertical search engine through the extension of Heritrix and Lucene, which is relevant to the subject of education video resources. In addition, this paper proposes a combination of tightly coupled for Heritrix and Lucene in order to achieve process optimi-zation and solve the problems of serial combination. The new combinational solution makes webpages crawling, web analysis and index building synchronously so as to reduce the cost of system input and output and the occupancy rate of disk. The experiment indicates that there is smaller difference between the combinational solution of tightly coupled and serial in the running efficiency of system. The result meets the need of practical application.
出处
《计算机工程与应用》
CSCD
2014年第15期113-116,135,共5页
Computer Engineering and Applications
基金
国家自然科学基金面上项目(No.61173190)