摘要
在文件检索的方法中,目前主要是基于数据库进行检索。但是,当待检索的数据量变得非常大的时候,再使用这种检索方式,大量的检索操作就会集中在一台主机上进行,这会导致检索效率降低。基于这种情况,拟采用分布式系统来解决这个问题。在分布式系统中进行资源检索时,可以基于MapReduce架构来实现检索,这样,检索操作的压力将分散到分布式系统的各个节点中,这样可以有效降低机器的压力,大大提高检索的效率。采用传统方式检索100万条数据,需要耗时500 s,而采用基于MapReduce架构的分布式系统的方法来检索100万的数据,只需要花费40 s,相对于传统检索方法采用基于MapReduce架构的分布式系统检索可使检索效率提升接近12.5倍。
In the document retrieval method,the key is built on the database search. However,when the amount of data to be retrieved becomes very large,using this search method,a large number of retrieval operations will be concentrated on a single host,which can result in reduced efficiency of retrieval. Under this background,a distributed system can be used to solve the problem. Retrieving resources in a distributed system can be based on MapReduce architecture to achieve retrieval. Thus,the pressure of retrieval operation will be allocated to each node in a distributed system,which can effectively reduce the pressure of the machine and greatly improve the retrieval efficiency. Using the traditional way,retrieving 1 million data consumes 500 seconds,while using the method based on MapReduce architecture for distributed systems to retrieve one million data only needs 40 seconds. Compared with traditional search method,method of distributed systems based on MapReduce architecture can promote efficiency to 12. 5 times.
出处
《河池学院学报》
2016年第2期101-105,共5页
Journal of Hechi University
基金
广西高校科学技术研究项目(LX2014320)
CALIS广西壮族自治区文献信息服务中心预研项目(LALISGX2014006)