摘要
MapReduce是Google开发的一种并行分布式计算模型,已在搜索和处理海量数据领域得到了广泛的应用。此模型只适用于数据关联性弱、能够高度并行化的程序,未能处理数据关联性强的数据(比如树形结构)。文中详细讨论了MapReduce的实现机制,提出了一种基于树结构的MapReduce模型,它是基于一种聚类聚合的反复轮询过程,聚合时用<k1,k2,…,kn,value>代替传统的<k,value>,使模型更具有一般性。最后搭建Hadoop平台来处理XML结构的海量数据,并比对新旧两种模型的效率。实验结果表明,其执行速度明显比传统模型高效。
MapReduce is a parallel distributed computing model developed by Google,it is widely used in the area of searching and large date dealing.This model can be used to process data with weak correlation degree,but unable to deal with the data efficicently by making full use of the relationship among the data(such as a tree).It proposes a MapReduce model based on the tree structure,it is based on a process which is featured in repeated polling with clustering aggregation,usek1,k2,…,kn,value rather than k,value as usual when aggregation,make the model more general.Experimental results show the execution speed is significantly higher than the traditional model.
出处
《计算机技术与发展》
2011年第8期149-152,共4页
Computer Technology and Development
基金
云南省自然科学基金(2007F174M)
云南大学研究生科研课题资助项目(ynny200928)