基于PC集群的并行数据仓库架构被引量：4

Parallel Data Warehouses Architecture Based on PC Cluster

下载PDF

导出

摘要针对数据仓库规模不断增长而导致难以确保即席查询分析性能的问题,提出一种构建在PC集群上的并行数据仓库架构——HDW,采用Google的GFS和Bigtable技术进行分布式存储管理,采用MapReduce技术进行并行联机分析处理,为前台应用程序提供遵循XMLA规范的统一接口。在18个节点的集群上进行实验,结果表明,HDW系统扩展性好,能快速处理至少千万条元组的数据。 As data warehouses grow in size,how to assuring the performance of answering Ad Hoc queries on massive data becomes a big challenge.To address the issue,this paper proposes a parallel data warehouse architecture,HDW,built upon PC cluster.It employs Google s GFS,Bigtable to process the distributive storage management and MapReduce to parallelize OLAP computation tasks.In addition,it provides the XMLA interface for front-end applications.Experimental results conducted on an 18-node cluster show that HDW scales well and can process large data sets with at least 10 million tuples.

作者游进国奚建清肖裕洪

机构地区华南理工大学计算机科学与工程学院

出处《计算机工程》 CAS CSCD 北大核心 2009年第20期73-75,共3页 Computer Engineering

基金广东省国际科技合作计划基金资助项目(2007A050100026) 广东省科技计划基金资助项目(2006B11301001) 广东省工业科技攻关计划基金资助项目(2006B80407001)

关键词数据仓库联机分析处理集群 data warehouse OLAP cluster

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献6

1DeWitt D J, Madden S, Stonebraker M. How to Build a High Performance Data Warehouse[EB/OL]. (2008-01-01). http://db.lcs. mit.edu/madden/high_perf.pdf.
2Dehne F, Rau-Chaplin A, Eavis T. The PANDA Project[EB/OL]. [2008-11-13]. http://projects.cs.dal.ca/panda/.
3李盛恩,王珊.封闭数据立方体技术研究[J].软件学报,2004,15(8):1165-1171. 被引量：25
4Ghemawat S, Gobioff H, Leung S T. The Google File System[C]// Proc. of the 19th Symposium on Operating Systems Principles. [S.I.]: ACM Press, 2003.
5Dean J, Ghemawat S. MapReduce: Simplified Data Processing on Large Clusters[C]//Proc. of the 6th Symposium on Operating Systems Design and Implementation. San Francisco, CA, USA: [s. n.], 2004.
6Chang F, Dean J, Ghemawat S, et al. BigTable: A Distributed Storage System for Structured Data[C]//Proc. of the 7th Symposium on Operating Systems Design and Implementation. Seattle, WA, USA: [s. n.], 2006.

二级参考文献13

1Lakshmanan LVS, Pei J, Han JW. Quotient cube: How to summarize the semantics of a data cube. In: Bressan S, Chaudhri AB, Lee ML, Yu JX, Lacroix Z, eds. Proc. of the 23rd Int'l Conf. on Very Large Data Bases. Hong Kong: Morgan Kaufmann, 2002. 778～789.
2Sismanis Y, Deligiannakis A, Roussopoulos N, Kotidis Y. Dwarf: Shrinking the PetaCube. In: Franklin MJ, Moon B, Ailamaki A, eds. Proc. of the 2002 ACM SIGMOD Int'l Conf. on Management of Data. Madison: ACM Press, 2002. 464～475.
3Mumick IS, Quass D, Mumick BS. Maintenance of data cubes and summary tables in a warehouse. In: Peckham J, ed. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. Tucson: ACM Press, 1997. 100-111.
4Hahn C, Warren S, London J. Edited synoptic cloud reports from ships and land stations over the globe. 1996. http://cdiac.esd.ornl.gov/cdiac/ndps/ndp026b.html
5Gray J, Bosworth A, Layman A, Pirahesh H. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. In: Su SYW, ed. Proc. of the 12th Int'l Conf. on Data Engineering. New Orleans: IEEE Computer Society, 1996. 152～159.
6Agarwal S, Agrawal R, Deshpande PM, Gupta A, Naughton JF, Ramarkrishman R, Sarawagi S. On the computation of multidimensional aggregates. In: Vijayaraman TM, Buchmann AP, Mohan C, Sarda NL, eds. Proc. of the 22nd Int'l Conf. on Very Large Data Bases. Mumb
7Zhao Y, Deshpande PM, Naughton JF. An array-based algorithm for simultaneous multidimensional. In: Peckham J, ed. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. Tucson: ACM Press, 1997. 159-170.
8Ross KA, Srivastava D. Fast computation of sparse datacubes. In: Jarke M, Carey MJ, Dittrich KR, Lochovsky FH, Loucopoulos P, Jeusfeld MA, eds. Proc. of the 23rd Int'l Conf. on Very Large Data Bases. Athens: Morgan Kaufmann, 1997. 116～125.
9Harinarayan V, Rajaraman A, Ullman JD. Implementing data cubes efficiently. In: Jagadish HV, Mumick IS, eds. Proc. of the 1996 ACM SIGMOD Int'l Conf. on Management of Data. Montreal: ACM Press, 1996. 205-216.
10Shukla A, Deshpande PM, Naughton JF. Materialized view selection for multidimensional datasets. In: Gupta A, Shmueli O, Widom J, eds. Proc. of the 24th Int'l Conf. on Very Large Data Base. New York: Morgan Kaufmann, 1998. 488～499.

共引文献24

1冷芳玲,鲍玉斌,于戈,高伟.基于MapReduce的封闭数据立方[J].计算机研究与发展,2011,48(S3):232-238. 被引量：4
2牟雁超,李红燕,王腾蛟.PHCC:一种处理稀疏变化的封闭数据立方体算法[J].计算机研究与发展,2013,50(S2):85-93. 被引量：2
3Sheng-EnLi,ShanWang.Semi-Closed Cube： An Effective Approach to Trading Off Data Cube Size and Query Response Time[J].Journal of Computer Science & Technology,2005,20(3):367-372. 被引量：2
4吴杰,蒋外文.基于集合运算的数据立方体结构[J].计算机应用研究,2007,24(11):225-227.
5陈富强,奚建清.一种新的封闭立方体查询算法[J].微计算机应用,2008,29(4):63-66. 被引量：1
6肖伟吉,奚建清,欧国华.封闭立方体反转索引查询优化技术[J].计算机应用研究,2008,25(10):2977-2981.
7侯东风,陆昌辉,刘青宝,张维明.数据立方体计算方法研究综述[J].计算机科学,2008,35(10):1-5. 被引量：6
8奚建清,游进国,汤德佑,肖伟吉.基于MapReduce的封闭立方体并行计算方法[J].华南理工大学学报（自然科学版）,2009,37(1):91-95. 被引量：8
9游进国,奚建清,张平健,刘艳霞.在PC集群上的封闭立方体计算[J].计算机科学,2009,36(6):153-155. 被引量：1
10张应龙,盛立琨.超大型压缩数据仓库的查询研究[J].计算机与现代化,2009(6):5-8. 被引量：1

同被引文献37

1林亿钦.高速公路联网收费系统防逃费技术的研究与应用[J].公路交通科技（应用技术版）,2008,4(10):182-183. 被引量：7
2孙娜,吴立增,苑津莎,王琨.电力设备数据仓库的设计开发[J].电力系统通信,2005,26(9):50-53. 被引量：3
3林峰,胡牧,蒋元晨,倪斌.电力调度综合数据平台体系结构及相关技术[J].电力系统自动化,2007,31(1):61-64. 被引量：87
4王继业.电力企业数据中心建立及其对策[J].中国电力,2007,40(4):69-73. 被引量：21
5EMC Corporation. Groundbreaking Study Forecasts a Staggering 988 Billion Gigabytes of Digital Information Created in 2010[EB/OL]. (2007-03-06). http://www.emc.com/about/news/ press/us/2007/03062007-4932.htm.
6Apache Hadoop Org.. Hadoop[EB/OL]. (2011-02-11). http://had oop.apache.org.
7Nosql-database Org.. NOSQL Databases[EB/OL]. (2011-02-10). http://nosql-database.org/.
8Dean J, Ghemawat S. MapReduce: Simplified Data Processing on Large Clusters[C]//Proc. of the 6th Symposium on Operating Systems Design and Implementation. San Francisco, USA: [s. n.], 2004.
9任云韬,李毅超,曹跃.基于注册表Hive文件的恶意程序隐藏检测方法[J].电子科技大学学报,2007,36(3):621-624. 被引量：6
10Soukup T,Davidson D[美];朱建秋,蔡伟杰(译).可视化数据挖掘:数据可视化和数据挖掘的技术与工具[M].北京:电子工业出版礼,2004:56.

引证文献4

1游进国,杨卓荦,胡建华,奚建清.一种支持大规模数据的多维可视化分析框架[J].计算机工程,2011,37(19):26-27. 被引量：10
2王德文,肖凯,肖磊.基于Hive的电力设备状态信息数据仓库[J].电力系统保护与控制,2013,41(9):125-130. 被引量：40
3吴仁彪,刘超,屈景怡.基于HBase和Hive的航班延误平台的存储方法[J].计算机应用,2018,38(5):1339-1345. 被引量：6
4龙伟,陈志,万亚平.基于Hadoop的高速公路联网收费稽查系统[J].西部交通科技,2014(8):73-78. 被引量：4

二级引证文献58

1杨晨雪,文敏.数据仓库技术与其在地质调查中的应用[J].地质论评,2019,65(S01):311-312.
2黄华,宋艳萍,努尔古丽.艾力,恰汗.合孜尔.非圆截面等离子体中Wangerin函数N_m^n(v)的稳定性研究[J].新疆农业大学学报,2011,34(6):534-538.
3王德文,肖凯,肖磊.基于Hive的电力设备状态信息数据仓库[J].电力系统保护与控制,2013,41(9):125-130. 被引量：40
4朱冬桥,虎胆.吐马尔白,靳志锋,赵永成,李慧.冻融期积雪覆盖棉田水盐运移规律研究[J].新疆农业大学学报,2013,36(6):504-507. 被引量：1
5齐林海,艾明浩,王金浩.基于Hadoop架构的电能质量监测云模型研究[J].电力信息与通信技术,2014,12(2):10-14. 被引量：9
6张爱斌.电力设备状态监控平台的构建[J].中国科技博览,2014(22):40-40.
7谢福伟,梁昌勇,马银超.基于云计算的景区数据仓库应用研究[J].计算机技术与发展,2014,24(9):198-201. 被引量：3
8林碧英,王艳萍.基于Hadoop的电力地理信息系统数据管理[J].计算机应用,2014,34(10):2806-2811. 被引量：10
9王德文,孙志伟.电力用户侧大数据分析与并行负荷预测[J].中国电机工程学报,2015,35(3):527-537. 被引量：267
10曲朝阳,刘晓庆,辛鹏.基于Hadoop的变电站设备故障状态识别与预测模型[J].软件导刊,2015,14(3):61-63. 被引量：3

1蒋旭东,冯建华,周立柱.并行数据仓库的研究[J].计算机科学,2001,28(3):1-3. 被引量：2
2张弛.并行数据仓库架构在PC集群中的应用[J].科学中国人,2014(12S):66-67.
3微软SQL Server增加对Hadoop的支持[J].硅谷,2011(18):77-77.
4田昶.海量空间数据的分布式存储管理及并行处理技术分析[J].电子技术与软件工程,2015(11). 被引量：1
5戴文海,陈红.基于并行数据仓库的数据分布调整策略[J].华中科技大学学报（自然科学版）,2005,33(z1):239-242. 被引量：1
6罗金满.海量空间数据的分布式存储管理及并行处理技术分析[J].科技传播,2016,8(1):93-94. 被引量：1
7布局大数据市场微软数据仓库一体机2013产品预览[J].中国信息化,2013(2):68-68.
8谈金泉.多TRANSPUTER系统的分布式存储管理[J].抗恶劣环境计算机,1993,7(5):42-47.
9Huiju WANG,Xiongpai QIN,Xuan ZHOU,Furong LI,Zuoyan QIN,Qing ZHU,Shan WANG.Efficient query processing framework for big data warehouse： an almost join-free approach[J].Frontiers of Computer Science,2015,9(2):224-236. 被引量：3
10熊全洪,魏娟,刘武.即席查询研究[J].现代商贸工业,2008,20(12):345-346. 被引量：3

计算机工程

2009年第20期

浏览历史

内容加载中请稍等...

基于PC集群的并行数据仓库架构被引量：4

参考文献6

二级参考文献13

共引文献24

同被引文献37

引证文献4

二级引证文献58

相关作者

相关机构

相关主题

浏览历史

基于PC集群的并行数据仓库架构 被引量：4

参考文献6

二级参考文献13

共引文献24

同被引文献37

引证文献4

二级引证文献58

相关作者

相关机构

相关主题

浏览历史

基于PC集群的并行数据仓库架构被引量：4