期刊文献+

海量数据分析的One-size-fits-all OLAP技术 被引量:32

One-size-fits-all OLAP Technique for Big Data Analysis
下载PDF
导出
摘要 传统的OLAP被迅速膨胀的海量数据推动进入了大规模数据分析时代,其主要特点是存储密度大,计算强度大,需要大规模并行存储和处理能力.无论是传统的并行数据库技术还是热点的MapReduce技术都不得不面对海量数据在大规模并行处理环境下的性能和并行处理效率的问题.以星型模型上复杂多表连接为基础的OLAP算法的复杂度和并行处理过程中的数据网络传输代价都成为制约性能的重要因素.通过深入分析OLAP存储模型和查询负载特征,提出了对OLAP查询中最基础的SPJGA-OLAP子集在存储、查询处理、数据分布、网络传输和分布式缓存等方面面向海量数据大规模并行处理框架的优化策略和实现技术.通过对TPC-H和SSB两个工业界和学术界公认的测试标准的分析,评估了技术的可行性.提出了以内存predicate-vector DDTA-JOIN算法为核心的并行内存OLAP架构,以维表上规范化的谓词向量操作替代了多样的连接执行计划,实现以一种查询处理模型同时满足集中式处理和大规模并行OLAP处理的需求,充分利用现代计算机的硬件优势,最小化网络传输和OLAP查询处理代价.实验中分析了在1TB和100TB数据集中数据分布策略的存储代价和传输代价,通过并行OLAP代价模型和实际数据的实验测试验证了技术的可行性和并行处理效率. The traditional OLAP is pushed into large scale analysis era by rapidly expending big data volume.The major features are high storage density,heavy workload,large scale storage and processing capacity.Both traditional parallel database and the hot topic MapReduce technique have to face the critical issues of performance and parallel processing efficiency of big data analytical processing in large scale parallel processing framework.The performance of star schema based OLAP with star-join is limited by processing complexity and network transmission cost in parallel processing.This paper makes a deep analysis of features of storage model and workload of OLAP,proposes the optimization mechanisms and implementation technologies for the most fundamental SPJGA-OLAP subset in storage,processing,distribution,network transmission,and distributed buffering.The technical feasibility is evaluated with the commonly accepted TPC-H industrial benchmark and SSB academic benchmark.This paper proposes the predicate-vector DDTA-JOIN centric parallel OLAP framework,replacing the diverse join execution plans with normalized predicate-vector processing,and enables one-size-fits-all OLAP model for both central processing and large scale parallel processing by making advantage of nowadays hardware,minimizing network transmission cost and processing cost.The analysis of the storage cost and network transmission cost for distribution mechanism with datasets of 1 TB and 100 TB is given.The technical feasibility and parallel processing efficiency are verified by OLAP cost model analysis and real data experiments.
出处 《计算机学报》 EI CSCD 北大核心 2011年第10期1936-1946,共11页 Chinese Journal of Computers
基金 国家重大科技专项基金项目(核高基项目2010ZX01042-001-002) 国家自然科学基金项目(61070054) 中国人民大学科学研究基金(中央高校基本科研业务费专项资金 10XNI018) 中国人民大学研究生(11XNH120)资助~~
关键词 OLAP 海量数据分析处理 谓词向量 星型模型 OLAP big data analytical processing predicate-vector star schema
  • 相关文献

参考文献13

  • 1O'Neil Patrick E, O'Neil Elizabeth J, Chen Xue-Dong, Revilak Stephen. The star schema benchmark and augmented fact table indexing//Proceedings of the TPCTC. Lyon, France, 2009:237 -252.
  • 2Han Wook-Shin, Ng Jack, Markl Volker, Kache Holger, Kandil Mokhtar. Progressive optimization in a shared-nothing parallel database//Proeeedings of the SIGMOD. Beijing, China, 2007:809 820.
  • 3任明达.酒西盆地白垩系野外剖面沉积相研究[Z].玉门石油管理局,1991,6..
  • 4Lima Alexandre A B, Furtado Camille, Valduriez Patrick, Mattoso Marta. Parallel OLAP query processing in database clusters with data replication. Distributed and Parallel Databases, 2009, 25(1-2): 97-123.
  • 5Furtado Pedro: Model and procedure for performance and availability wise parallel warehouses. Distributed and Parallel Databases, 2009, 25(1-2): 71- 96.
  • 6Yang Christopher, Yen Christine, Tan Ceryen, Madden Samuel. Osprey: Implementing MapReduce-style fault toler ance in a shared nothing distributed database//Proceedings of the ICDE. Long Beach, California, USA, 2010:657-668.
  • 7Chen Songting. Cheetah: A high performance, custom data warehouse on top of MapReduce//Proceedings of the VLDB. Singapore, 2010, 3(2): 1459-1468.
  • 8SAP NetWeaver: A Complete Platform for Large-Scale Busi ness Intelligence. Winter Corporation White Paper. May, 2005.
  • 9The Vertica Analytic Database: Rethinking Data Warehouse Architecture. Winter Corporation White Paper. May, 2005.
  • 10MacNicol R, French B. Syhase IQ muhiplex designed for an alytics//Proceedings of the VLDB. Toronto, Canada, 2004: 1227-1230.

共引文献1

同被引文献235

引证文献32

二级引证文献187

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部