期刊文献+

基于Lucene和Heritrix的全文检索引擎的研究与应用

Research and Application of Full-Text Searching Engine Based on Lucene and Heritrix
下载PDF
导出
摘要 Lucene是一个用Java写的全文索引引擎工具包,访问索引时间快,支持多用户访问,可以跨平台使用。Heritrix是一个开源的由Java开发的Web网络爬虫框架,用户可以使用它从网络上抓取想要搜索的资源。该文分析了Lucene的索引机制,探讨了Heritrix的结构框架,最后结合实例对基于Lucene和Heritrix技术的全文检索的应用进行深入研究。 Lucene is a full text indexing engine package written in Java language.It has high access speed,supports multi-user accesses and can be used in a cross platform way.Heritrix is an open source web spider explored by Java.Users can snatch information from Internet by using it.The searching mechanisms of Lucene were analysis and the frameworks of Heritrix were discussed in this paper.And finally,we de veloped an application to make a deep study to realize the full text searching based on Lucene.
作者 卿秀华 QING Xiu-hua(School of Electronic & Electrical Engineering,Wuhan Textile University,Wuhan 430073,China)
出处 《电脑知识与技术》 2012年第5期2962-2964,2993,共4页 Computer Knowledge and Technology
基金 湖北省教育厅项目(NO:D200717005)
关键词 LUCENE 全文检索引擎 HERITRIX Lucene full text searching engine Heritrix
  • 相关文献

参考文献5

  • 1Otis Gospodnetic, Erik Hatcher. Lucene in Action (中文版)[M].北京:电子工业出版社,2007.
  • 2Apache Lucene Project [EB/OL]. http://jakarta.apache.org/lucene/.2009.
  • 3Apache Software Foundation. Lucene Query Syntax[EB/OL]. http://lucene.apache.org/java/docs/.2009.
  • 4Baeza Yates R, Ribeiro Neto B. Modern Information Retrieval [M].北京:机械工业出版社,2004.
  • 5刘汉兴,刘财兴.主题爬虫的搜索策略研究[J].计算机工程与设计,2008,29(12):3160-3162. 被引量:26

二级参考文献18

  • 1郑冬冬,赵朋朋,崔志明.Deep Web爬虫研究与设计[J].清华大学学报(自然科学版),2005,45(S1):1896-1902. 被引量:28
  • 2赵丰年,刘林,商建云.基于概念的文本过滤模型[J].计算机工程与应用,2006,42(4):186-188. 被引量:11
  • 3林海霞,原福永,陈金森,刘俊峰.一种改进的主题网络蜘蛛搜索算法[J].计算机工程与应用,2007,43(10):174-176. 被引量:18
  • 4Srinivasan P, Mencezer F, Pant G. A general evalution framework for topical crawers [J]. Information Retrieval,2005,8(3): 417-447.
  • 5Menczer F, Pant G, Srinivasan ETopic web crawler:Evaluating adaptive algorithm [J]. ACM Transactions on Intemet Technology,2004,4(4):378-419.
  • 6Chakrabarti S,Punera K, Subramanyam M.Accelerated focused crawling through online relevance feedback[C].Honolulu:Proc of the 11th Intemation World Wide Web Conference,2001.
  • 7Aggarwal C, AL-Garawi F, Yu S P. Intelligent crawling on the world wide web with arbitrary Predicate[C].HongKong:Proc of the 10th International World Wide Web Conference,2001.
  • 8Wu J,Aberer K.Using siterank for decentralized computation of web document ranking[EB/OL], http://lsirpeople.epfl.ch/aberer/ PAPERS/AH2004.pdf.
  • 9Bharat K, Chang BW, Henzinger MR,et al.Who links to whom: Mining linkage between web sites [C]. California: Proc of the IEEE International Conference on Data Mining (ICDM'01), 2001.
  • 10ET O'Neill,BF Lavoie,Bennett R.Trends in the evolution of the public web [EB/OL]. http://dlib.org/dlib/april03/lavoie/041avoie.html.

共引文献25

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部