摘要
随着网络技术的快速发展,通用搜索引擎已经不能满足用户的一些需求,特别是当用户需要搜索某一领域内的信息时,垂直搜索引擎就正好符合这种需求。以手机资源为背景,通过运用扩展Heritrix和Lucene,构建了一个检索结果比较精准的垂直搜索引擎。研究了通过定制和扩展Heritrix从互联网上爬取相关的信息资源,利用HtmlParser工具对爬取的信息进行分析和抽取,运用Lucene建立全文索引和提供检索服务,并设计了MVC的查询接口。通过响应时间、查全率和查准率的测试实验表明,系统达到了设计目标。
With the fast development of network technology,universal search engine always can not meet many user demands,especially when user needs to search some information in a field,vertical search engine accords with user demands.Cell phone resource search was discussed.It initially comes up with a vertical search with fairly precise outcome through expanding the use of Heritrix and Lucene.The major research work of this paper is divided into four parts.Firstly,by customizing and extending the Heritrix,it crawled some information from Internet.Secondly,the crawled information was analyzed and cramped out,some of that with the tool of HtmlParser.Thirdly,Lucene used to build a fulltext index and retrieval service for the system.Finally,the system design a MVC connector.The system achieves design goals through the tests of response time,recall ratio and precision ratio.
出处
《计算机科学》
CSCD
北大核心
2014年第B11期455-460,共6页
Computer Science
基金
武汉理工大学华夏学院院级科研基金项目(11030)资助