期刊文献+

大数据分析——RDBMS与MapReduce的竞争与共生 被引量:386

Big Data Analysis—Competition and Symbiosis of RDBMS and MapReduce
下载PDF
导出
摘要 在科学研究、计算机仿真、互联网应用、电子商务等诸多应用领域,数据量正在以极快的速度增长,为了分析和利用这些庞大的数据资源,必须依赖有效的数据分析技术.传统的关系数据管理技术(并行数据库)经过了将近40年的发展,在扩展性方面遇到了巨大的障碍,无法胜任大数据分析的任务;而以MapReduce为代表的非关系数据管理和分析技术异军突起,以其良好的扩展性、容错性和大规模并行处理的优势,从互联网信息搜索领域开始,进而在数据分析的诸多领域和关系数据管理技术展开了竞争.关系数据管理技术阵营在丧失搜索这个阵地之后,开始考虑自身的局限性,不断借鉴MapReduce的优秀思想改造自身,而以MapReduce为代表的非关系数据管理技术阵营,从关系数据管理技术所积累的宝贵财富中挖掘可以借鉴的技术和方法,不断解决其性能问题.面向大数据的深度分析需求,新的架构模式正在涌现.关系数据管理技术和非关系数据管理技术在不断的竞争中互相取长补短,在新的大数据分析生态系统内找到自己的位置. In many areas such as science, simulation, Internet, and e-commerce, the volume of data to be analyzed grows rapidly. Parallel techniques which could be expanded cost-effectively should be invented to deal with the big data. Relational data management technique has gone through a history of nearly 40 years. Now it encounters the tough obstacle of scalability, which relational techniques can not handle large data easily. In the mean time, none relational techniques, such as MapReduce as a typical representation, emerge as a new force, and expand their application from Web search to territories that used to be occupied by relational database systems. They confront relational technique with high availability, high scalability and massive parallel processing capability. Relational technique community, after losing the big deal of Web search, begins to learn from MapReduce. MapReduce also borrows valuable ideas from relational technique community to improve performance. Relational technique and MapReduce compete with each other, and learn from each other; new data analysis platform and new data analysis eco-system are emerging. Finally the two camps of techniques will find their right places in the new eco-system of big data analysis.
出处 《软件学报》 EI CSCD 北大核心 2012年第1期32-45,共14页 Journal of Software
基金 国家自然科学基金(61070054 60873017 61170013) 核高基重大科技专项(2010ZX01042-001-002 2010ZX 01042-002-002-03) 中央高校基本科研业务费专项资金(10XNI018)
关键词 大数据 深度分析 关系数据管理技术 MAPREDUCE big data deep analysis relational data management technique MapReduce
  • 相关文献

参考文献82

  • 1Zhou AY. Data intensive computing-challenges of data management techniques. Communications of CCF, 2009,5(7):50-53 (in Chinese with English abstract).
  • 2Cohen J, Dolan B, Dunlap M, Hellerstein JM, Welton C. MAD skills: New analysis practices for big data. PVLDB, 2009,2(2): 1481-1492.
  • 3Schroeder B, Gibson GA. Understanding failures in petascale computers. Journal of Physics: Conf. Series, 2007,78(1):1-11. [doi: 10.1088/1742-6596/78/1/012022].
  • 4Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. In: Brewer E, Chen P, eds. Proc. of the OSDI. California: USENIX Association, 2004. 137-150. [doi: 10.1145/1327452.1327492].
  • 5Pavlo A, Paulson E, Rasin A, Abadi DJ, Dew:itt DJ, Madden S, Stonebraker M. A comparison of approaches to large-scale data analysis. In: Cetintemel U, Zdonik SB, Kossmann D, Tatbul N, eds. Proc. of the SIGMOD. Rhode Island: ACM Press, 2009. 165-178. [doi: 10.1145/1559845.1559865].
  • 6Chu CT, Kim SK, Lin YA, Yu YY, Bradski G, Ng AY, Olukotun K. Map-Reduce for machine learning on multicore. In: Scholkopf B, Platt JC, Hoffman T, eds. Proe. of the NIPS. "Vancouver: MIT Press, 2006. 281-288. [doi: 10.1234/12345678].
  • 7Wang CK, Wang JM, Lin XM, Wang W, Wang HX, Li HS, Tian WP, Xu J, Li R. MapDupReducer: Detecting near duplicates over massive datasets. In: EImagarmid AK, Agrawal D, eds. Proc. of the SIGMOD. Indiana" ACM Press, 2010. 1119-1122. [doi: 10.1145/1807167.1807296].
  • 8Liu C, Guo F, Faloutsos C. BBM: Bayesian browsing model from petabyte-scale data. In: Elder JF IV, Fogelman-Souli6 F, Flach PA, Zaki MJ, eds. Proc. of the KDD. Paris: ACM Press, 2009. 537-546. [doi: 10.1145/1557019.1557081].
  • 9Panda B, Herbach JS, Basu S, Bayardo ILl. PLANET: Massively parallel learning of tree ensembles with MapReduce. PVLDB, 2009,2(2):1426-1437.
  • 10Lin J, Schatz M. Design patterns for efficient graph algorithms in MapReduce. In: Rao B, Krishnapuram B, Tomkins A, Yang Q, eds. Proc. of the KDD. Washington: ACM Press, 2010.78-85. [doi: 10.1145/1830252.1830263].

同被引文献3533

引证文献386

二级引证文献5932

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部