并行框架下基于位图索引的多表星型连接算法

Bitmap index based multi-table star schema join technology algorithm in parallel framework

下载PDF

导出

摘要分析面向大数据平台的MapReduce分布式编程技术以及实现数据查询时的连接算法,针对SSB数据模型,提出基于分布式缓存的多表星型连接优化技术。利用谓词向量技术,将维表中间连接的数据依赖转化为表上的位图索引过滤,减少数据依赖产生的巨大网络开销;采用分布式缓存技术充分利用处理节点的内存,优化网络传输,减少查询代价。 The algorithm of the MapReduce distributed programming technology and the connection algorithm realizing data queries on the platform of big data were analyzed. Additionally, aiming at the star schema benchmark （SSB） data models, the distributed cache multi-table star scheme join optimization technology was proposed. The predicate vector technology was used to convert the reliance of data in the middle dimension table to the bitmap index filter to reduce the huge network overhead caused by the data reliance. The distributed caching technology was used to process the nodes memory, which optimized the network transmission and reduced the query cost.

作者解晨光刘明刚

机构地区哈尔滨金融学院科研处哈尔滨金融学院计算机系

出处《计算机工程与设计》 CSCD 北大核心 2014年第9期3107-3112,共6页 Computer Engineering and Design

基金 2012年黑龙江省科技攻关基金项目(GC12A307)

关键词并行框架星型模式分布式缓存位图索引连接 parallel framework star schema distributed cache bitmap index join

分类号 TP274 [自动化与计算机技术—检测技术与自动化装置]

引文网络
相关文献

参考文献10

1黄山,王波涛,王国仁,于戈,李佳佳.MapReduce优化技术综述[J].计算机科学与探索,2013,7(10):865-885. 被引量：30
2Yang H,Dasdan A,Hsiao RL,et al.Map-reduce-merge:Simplified relational data processing on large clusters[C]//Proceedings of the ACM SIGMOD International Conference on Management of data.ACM,2007:1029-1040.
3O' Neil P,O' Neil E,Chen X.Star schema benchmark-revision 3[R/OL].USA:University of Massachusetts Boston.http://www.cs.umbo edu/-poneil/StarSchemaB.PDF,2009.
4Vernica R,Carey MJ,Li Chen.Efficient parallel set-similarity joins using MapReduce[C]//Proceedings of the ACM SIGMOD International Conference on Management of Data.NY,USA:ACM,2010:495-506.
5孙大烈,李建中.基于MapReduce的Skyline-join查询算法[J].哈尔滨工业大学学报,2012,44(1):103-106. 被引量：6
6赵保学,李战怀,陈群,潘巍,姜涛,金健.基于共享的MapReduce多查询优化技术[J].计算机应用研究,2013,30(5):1405-1409. 被引量：7
7Blanas S,Patel JM,Ercegovac V,et al.A comparison of join algorithms for log processing in MapReduce[C]//Proceedings of the ACM SIGMOD International Conference on Management of data.ACM,2010:975-986.
8Okcan A,Riedewald M.Processing theta-joins using MapReduce[C]//Proceedings of the ACM SIGMOD International Conference on Management of Data.NY,USA:ACM,2011:949-960.
9Afrati FN,Ullman JD.Optimizing multiway joins in a MapReduce environment[J].IEEE Transactions on Knowledge and Data Engineering,2011,23 (9):1282-1298.
10张延松,焦敏,王占伟,王珊,周烜.海量数据分析的One-size-fits-all OLAP技术[J].计算机学报,2011,34(10):1936-1946. 被引量：31

二级参考文献121

1O'Neil Patrick E, O'Neil Elizabeth J, Chen Xue-Dong, Revilak Stephen. The star schema benchmark and augmented fact table indexing//Proceedings of the TPCTC. Lyon, France, 2009:237 -252.
2Han Wook-Shin, Ng Jack, Markl Volker, Kache Holger, Kandil Mokhtar. Progressive optimization in a shared-nothing parallel database//Proeeedings of the SIGMOD. Beijing, China, 2007:809 820.
3Lima Alexandre A B, Furtado Camille, Valduriez Patrick, Mattoso Marta. Parallel OLAP query processing in database clusters with data replication. Distributed and Parallel Databases, 2009, 25(1-2): 97-123.
4Furtado Pedro: Model and procedure for performance and availability wise parallel warehouses. Distributed and Parallel Databases, 2009, 25(1-2): 71- 96.
5Yang Christopher, Yen Christine, Tan Ceryen, Madden Samuel. Osprey: Implementing MapReduce-style fault toler ance in a shared nothing distributed database//Proceedings of the ICDE. Long Beach, California, USA, 2010:657-668.
6Chen Songting. Cheetah: A high performance, custom data warehouse on top of MapReduce//Proceedings of the VLDB. Singapore, 2010, 3(2): 1459-1468.
7SAP NetWeaver: A Complete Platform for Large-Scale Busi ness Intelligence. Winter Corporation White Paper. May, 2005.
8The Vertica Analytic Database: Rethinking Data Warehouse Architecture. Winter Corporation White Paper. May, 2005.
9MacNicol R, French B. Syhase IQ muhiplex designed for an alytics//Proceedings of the VLDB. Toronto, Canada, 2004: 1227-1230.
10Stonebraker Michael, Abadi Daniel J, Batkin Adam, Chen Xuedong et al. C Store: A column-oriented DBMS//Proceed ings of VLDB. Trondheim, Norway, 2005:553 -564.

共引文献69

1周红进,王秀森.基于MatLab的海量数据处理方法[J].计算机与数字工程,2012,40(5):89-90. 被引量：6
2张太华,何二宝,孙超.基于知识的云制造的研究现状[J].现代机械,2012(5):1-5. 被引量：5
3熊超,武小年,张昭.基于欺负算法的改进选举算法[J].计算机工程与设计,2012,33(12):4432-4435. 被引量：2
4贺超波,汤庸,陈国华,刘海,吴琳琳.面向大规模社交网络的潜在好友推荐方法[J].合肥工业大学学报（自然科学版）,2013,36(4):420-424. 被引量：7
5张延松.数据库与MapReduce融合的大数据管理技术探索[J].科研信息化技术与应用,2013,4(1):19-29. 被引量：4
6沈来信,王伟.基于Tree-lib的大数据实时分析研究[J].计算机科学,2013,40(6):192-195. 被引量：9
7罗学礼,徐树振,王森,杨莉.企业非结构化数据管理平台研究[J].云南电力技术,2013,41(5):34-37. 被引量：4
8厉剑,张绍雄,刘俊杰,李成柱.大数据引发信息时代新变革[J].大众科技,2013,15(12):7-10. 被引量：11
9朱立红,杨鹤标.海量结构化数据查询系统的研究与实现[J].计算机应用与软件,2014,31(2):29-32. 被引量：5
10宋杰,郭朝鹏,王智,张一川,于戈,Jean-Marc PIERSON.大数据分析的分布式MOLAP技术[J].软件学报,2014,25(4):731-752. 被引量：34

1袁玉宝,老松杨,谢毓湘,韩智广.基于内容的WEB图像检索引擎[J].计算机系统应用,2002,11(10):26-29. 被引量：1
2吴小蓉.ARP攻击的快速抵御[J].网管员世界,2008(24):93-94.
3张伟诚,艾丽蓉.基于索引过滤实现搜索引擎中的访问控制[J].计算机与现代化,2014(1):161-163.
4梁义海,乔卫民,王彦瑜,敬岚.串口通信星型连接的CPLD实现[J].单片机与嵌入式系统应用,2003,3(11):24-26. 被引量：2
5周国亮,萨初日拉,朱永利.Spark环境下基于多维布隆过滤器的星型连接算法[J].计算机应用,2016,36(2):353-357. 被引量：1
6焦敏,张延松,王珊,陈红.内存OLAP多核并行查询优化技术研究[J].计算机学报,2014,37(9):1895-1910. 被引量：3
7徐世强,赵霁,牛泽民.基于GPRS的远程视频监控系统的设计与应用[J].计算机技术与发展,2010,20(12):150-153. 被引量：8
8曹立新,高宏.基于星型模式的一个多路top-k join算法[J].计算机学报,2011,34(10):1926-1935.
9李鹏,李杰.改进的事件日志分簇算法研究[J].微计算机信息,2009(9):147-148. 被引量：1
10王永文,陈微,郑倩冰,窦强.多线程向量处理器中向量数据存储结构的设计与实现[J].计算机研究与发展,2012,49(S1):53-55. 被引量：1

计算机工程与设计

2014年第9期

浏览历史

内容加载中请稍等...

并行框架下基于位图索引的多表星型连接算法

参考文献10

二级参考文献121

共引文献69

相关作者

相关机构

相关主题

浏览历史