摘要
Lucene是一个用Java写的全文索引引擎工具包,访问索引时间快,支持多用户访问,可以跨平台使用。Heritrix是一个开源的由Java开发的Web网络爬虫框架,用户可以使用它从网络上抓取想要搜索的资源。该文分析了Lucene的索引机制,探讨了Heritrix的结构框架,最后结合实例对基于Lucene和Heritrix技术的全文检索的应用进行深入研究。
Lucene is a full text indexing engine package written in Java language.It has high access speed,supports multi-user accesses and can be used in a cross platform way.Heritrix is an open source web spider explored by Java.Users can snatch information from Internet by using it.The searching mechanisms of Lucene were analysis and the frameworks of Heritrix were discussed in this paper.And finally,we de veloped an application to make a deep study to realize the full text searching based on Lucene.
作者
卿秀华
QING Xiu-hua(School of Electronic & Electrical Engineering,Wuhan Textile University,Wuhan 430073,China)
出处
《电脑知识与技术》
2012年第5期2962-2964,2993,共4页
Computer Knowledge and Technology
基金
湖北省教育厅项目(NO:D200717005)