期刊文献+

基于Pig__Spark的分布式数据分析处理平台 被引量:1

Distributed data analysis and processing platform based on Pig__Spark
下载PDF
导出
摘要 传统的数据分析平台Pig的执行引擎是MapReduce,由于MapReduce的局限性,使得数据处理过程中存在高延迟,内存开销大等缺点。为克服这些不足,文中基于当下最流行的内存计算框架Spark,在保留传统数据分析平台Pig语言特性和基础设施的基础上,开发实现了一种全新的数据分析处理平台,并通过具体实验对比两个数据平台的性能。实验结果证明,基于Saprk的数据分析平台在数据处理速度上远远高于传统的数据分析平台Pig。 The traditional data analysis platform Pig is developed based on MapReduce.Due to the limitations of MapReduce,Pig has some shortcoming,such as high latency and memory overhead in the process of data processing.In order to overcome these shortcomings,based on the most popular memory computing framework,this paper develops and implements a new data analysis and processing platform on the basis of Pig’s linguistic features and infrastructure.It compares the performance of the two data platform through the specific experiments.The experimental results show that the data analysis platform based on Saprk is faster than the traditional data analysis platform Pig in the data analysis and processing.
作者 陈晓 于金良 朱志祥 CHEN Xiao;YU Jin-liang;ZHU Zhi-xiang(Xi'an University of Posts and Telecommunications,Xi5 an 710061,China)
机构地区 西安邮电大学
出处 《信息技术》 2017年第7期45-48,55,共5页 Information Technology
基金 2015陕西省信息化技术研究项目课题(2015-002) 2015年工信部通信软科学研究项目(2015-R-19)
关键词 SPARK PIG 大数据 内存计算框架 数据分析处理平台 Spark Pig big data memory computing framework data analysis and processing platform
  • 相关文献

参考文献1

二级参考文献42

  • 1[OL].<http://hadoop.apache.org.>.
  • 2WinterCorp: 2005 TopTen Program Summary. http:// www. wintercorp, com/WhitePapers/WC TopTenWP. pdf.
  • 3TDWI Checklist Report: Big Data Analytics. http://tdwi. org/research/2010/08/Big-Data-Analytics, aspx.
  • 4Chaudhuri S, Dayal U. An overview of data warehousing and OLAP technology. SIGMOD Rec, 1997,26(1): 65-74.
  • 5Madden S, DeWitt D J, Stonebraker M. Database parallelism choices greatly impact scalability. DatabaseColumn Blog. http://www, databasecolumn, com/2007/10/database-parallelism-choices, html.
  • 6Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters//Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI ' 04). San Francisco, California, USA, 2004: 137-150.
  • 7DeWitt D J, Gerber R H, Graefe G, Heytens M L, Kumar K B, Muralikrishna M. GAMMA--A high performance dataflow database machine//Proceedings of the 12th International Conference on Very Large Data Bases (VLDB' 86). Kyoto, Japan, 1986:228-237.
  • 8Fushimi S, Kitsuregawa M, Tanaka H. An overview of the system software of a parallel relational database machine// Proceedings of the 12th International Conference on Very Large DataBases(VLDB'86). Kyoto, Japan, 1986:209-219.
  • 9Brewer E A. Towards robust distributed systems//Proceedings of the 19th Annual ACM Symposium on Principles of Distributed Computing (PODC' 00). Portland, Oregon, USA, 2000:7.
  • 10http: //www. dbms2, com/2008/08/26/known-applications of mapreduce/.

共引文献614

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部