摘要
Lucene是一个用Java写的全文索引引擎工具包,访问索引时间快,支持多用户访问,可以跨平台使用。Heritrix是一个由Java开发的、开源的Web网络爬虫,用户可以使用它从网络上抓取想要的资源。探讨了Lucene和Heritrix在构建垂直搜索引擎中的应用。
Lueene is a full text indexing engine package written in Java language. It has high access speed, supports multi-user accesses and can be sued in a cross-platform way. Heritrix is an open source web spider explored by Java. Users can snatch information from Internet by using it. In this paper it studies Lucene and Heritrix technology,analyzes the application in designing a Vertical Search Engine based on them.
出处
《计算机应用与软件》
CSCD
2009年第1期212-215,247,共5页
Computer Applications and Software