摘要
处理效率是数据流程处理的重要指标,简单的单服务器结构已经难以适应海量数据的处理任务。为了能够完成海量数据的流程处理任务,简要介绍了Google的MapReduce的结构,Fegaras等剔除的MapReduce查询语言(MRQL),并基于MapReduce编程模型和MRQL提出了一种分布式数据汇聚方法。该方法借助MapReduce完成数据处理流程的执行,借助MRQL控制MapReduce。在XBus数据汇聚平台基础上,结合MapReduce和MRQL实现了MRXBus分布式数据汇聚平台,验证了该方法的可行性。实验表明,该方法可以减少大数据量的处理时间,提高处理效率。
Processing efficiency is an important indicator of the data flow process. It is hard to accomplish large data processing tasks by the simple single-server structure. In order to accomplish the process of massive data processing tasks, first introduced the structure of MapReduee proposed by Google and MRQL ( MapReduce Query Language) proposed by Fegaras, and then proposed a distributed data flow processing method based on MapReduce programming model and MRQL. This method used MapReduee to carry out the data flow processing and MRQL to control MapReduce. Based on XBus data aggregation platform, MRXBus ( MapReduce XBus) was designed and implemented to verify the feasibility of the method. Experimental results show that this method can reduce the time of massive data processing and improve the efficiency of data aggregation.
出处
《计算机应用》
CSCD
北大核心
2013年第A02期57-59,127,共4页
journal of Computer Applications
基金
山东省自然科学基金资助项目(ZR2011FQ028)
山东省统计科研重点课题一般项目(KT12067)