摘要
传统的数据分析平台Pig的执行引擎是MapReduce,由于MapReduce的局限性,使得数据处理过程中存在高延迟,内存开销大等缺点。为克服这些不足,文中基于当下最流行的内存计算框架Spark,在保留传统数据分析平台Pig语言特性和基础设施的基础上,开发实现了一种全新的数据分析处理平台,并通过具体实验对比两个数据平台的性能。实验结果证明,基于Saprk的数据分析平台在数据处理速度上远远高于传统的数据分析平台Pig。
The traditional data analysis platform Pig is developed based on MapReduce.Due to the limitations of MapReduce,Pig has some shortcoming,such as high latency and memory overhead in the process of data processing.In order to overcome these shortcomings,based on the most popular memory computing framework,this paper develops and implements a new data analysis and processing platform on the basis of Pig’s linguistic features and infrastructure.It compares the performance of the two data platform through the specific experiments.The experimental results show that the data analysis platform based on Saprk is faster than the traditional data analysis platform Pig in the data analysis and processing.
作者
陈晓
于金良
朱志祥
CHEN Xiao;YU Jin-liang;ZHU Zhi-xiang(Xi'an University of Posts and Telecommunications,Xi5 an 710061,China)
出处
《信息技术》
2017年第7期45-48,55,共5页
Information Technology
基金
2015陕西省信息化技术研究项目课题(2015-002)
2015年工信部通信软科学研究项目(2015-R-19)