期刊文献+

一种海量数据下的Lucene全文检索解决方案 被引量:1

A Solution for Lucene-based Full-text Search under Massive Data Environment
下载PDF
导出
摘要 针对海量数据下的全文检索遇到的索引时间过长、I/O过高的时间较长、检索响应时间较长的问题进行了分析。提出了一种以索引类型分解、索引分块、多索引联合检索以及使用RMI提供远程检索服务相结合的解决方案。实验和生产环境中的应用表明,此方案能解决上述问题并能够提供稳定、高效的搜索服务。 This paper analyzed the bottlenecks of full-text information retrieval under the massive data environment:index-building takes too much time、I/O rates stand at a high level for a long time、response time is too long.Propose a index organize method which based on data organized by the its type、multi-indices、multi-indices joint searching and using RMI technology to provide search services solution.Experiments and the application under the production environment,this program can address these bottlenecks and to provide a stable and efficient search service.
出处 《电脑开发与应用》 2011年第4期32-35,共4页 Computer Development & Applications
关键词 海量数据 LUCENE RMI 全文检索 SEARCH ENGINE massive data Lucene RMI full-text information retrieval Search engine
  • 相关文献

参考文献7

二级参考文献23

共引文献47

同被引文献10

  • 1MOFFAT A, WEBBER W, ZOBEL J. Load balancing for term-dis- tributed parallel retrieval [ C]// SIGIR'06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and De- velopment in Information Retrieval. New York: ACM Press, 2006: 348 - 355.
  • 2OWEN S, ANIL R, DUNNING T, et al. Mahout in action [ M]. Greenwich: Manning Publications, 2010:123 - 137.
  • 3ESTEVES R M, PAIS R, RONG C. K-means clustering in the cloud--a Mahout test [ C]// Proceedings of the 2011 IEEE Work- shops of International Conference on Advanced Information Networ- king and Applications. Washington, DC: IEEE Computer Society, 2011:514 -519.
  • 4ESTEVES R M, RONG C. Using Mahout for clustering Wikipedia's latest articles: a comparison between K-means and fuzzy C-means in the cloud [ C]// Proceedings of the 2011 IEEE Third International Conference on Cloud Computing Technology and Science. Washing- ton, DC: IEEE Computer Society, 2011:565-569.
  • 5BUTLER M H, RUTHERFORD J. Distributed Lucene: a distribu- ted free text index for Hadoop [ EB/OL]. [ 2012-03-25]. http:/! www. hpl. hp. com/techreports/2008/HPL-2008-64, pdf.
  • 6SAJJA K. Performance study of Lucene in parallel and distributed environments [ D]. Boise: Boise State University, 2011.
  • 7HATCHER E, GOSPODNETIC O, McCANDLESS M. Lueene in action [ M]. Greenwich: Manning Publications, 2009.
  • 8徐文海,温有奎.一种基于TFIDF方法的中文关键词抽取算法[J].情报理论与实践,2008,31(2):298-302. 被引量:65
  • 9王浩,姚长利,郭琳,艾国庆.基于中文搜索引擎网络信息用户行为研究[J].计算机应用研究,2009,26(12):4665-4668. 被引量:12
  • 10曹宇,尹刚,李翔,程荣斌,王怀民.聚类搜索引擎研究进展浅析[J].电脑知识与技术,2011,7(8):5398-5400. 被引量:2

引证文献1

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部