摘要
[目的/意义]探讨分布式信息检索系统设计,解决大数据背景下传统信息检索系统效率低下而不能满足检索需求的问题。[方法/过程]从Hadoop框架着手,探讨基于Hadoop的分布式信息检索系统,提出改进构想,并通过实验验证其可行性。[结果/结论]提出对输入数据流采用预处理方式替代批处理方式的构想,实验验证了该构想的可行性。Hadoop框架已成功应用于多个领域,但是其中的Map Reduce算法及算法效率优化有待研究。
[Purpose/significance] The paper is to discuss design of distributed information retrieval system, to solve the problem of traditional information retrieval system's can't satisfying the requirements of people's needs due to its low efficiency in the context of big data.[Method/process] The paper begins with Hadoop framework to discuss Hadoop-based distributed information retrieval system, put forward some ideas of improvements, and verifies its feasibility through experiment. [Result/conclusion]The paper proposes the ideas of using pretreatment mode instead of batch mode for input data stream, and verifies its feasibility by experiment. The Hadoop framework has been successfully applied in many fields, but the Map Reduce algorithm and algorithm efficiency optimization in it remains to be studied.
出处
《情报探索》
2016年第8期125-130,共6页
Information Research