基于GPU加速的超精简型编码数据库系统被引量：7

A GPU-Accelerated Highly Compact and Encoding Based Database System

下载PDF

导出

摘要在数据爆发式增长的今天,特别是通信、金融、互联网等领域产生的大规模数据,在存储和查询方面给业界带来了前所未有的压力.在这种背景下,当前的数据库和数据仓库系统通过对数据进行压缩编码,在节约空间的同时减少了数据表查询时所需的I/O,获得性能上的提升,但大部分系统在面对实际大规模企业数据应用时依然无法在压缩比、导入时间或查询性能上完全满足企业需求.通过基于一定的规则对数据重新进行编码和精简,实现了一种新型超精简型编码的数据库系统HEGA-STORE.采用行列混合存储的架构;提出基于列内和列间规则挖掘和编码的数据导入存储计划;同时在规则挖掘和编码中使用GPU作为协处理器并行处理算法从而提高效率.通过开发编解码原型系统,对大规模网易易信通信记录数据和网易后台日志数据的导入和查询分别进行了测试,并与其他压缩编码算法和数据库、数据仓库产品进行比较.对比实验结果表明,相比同类数据库和数据仓库产品,原型系统拥有极高的压缩比,并且在导入速度和全表扫描查询速度也处于领先地位,同时使用GPU和CPU协作进行数据处理时也能进一步提高系统性能,验证了提出的超精简型编码数据库系统的实际应用价值. In the big data era, business applications generate huge volumes of data, making it extremely challenging to store and manage those data. One possible solution adopted in previous database systems is to employ some types of encoding techniques, which can effectively reduce the size of data and consequential improve the query performance. However, existing encoding approaches still cannot make a good tradeof{ between the compression ratio, importing time and query performance. In this paper, to address the problem, we propose a new encoding-based database system, HEGA-STORE, which adopts the hybrid row-oriented and column-oriented storage model. In HEGA-STORE, we design a GPU-assistant encoding scheme by combining the rule-based encoding and conventional compression algorithms. By exploiting the computation power of GPU, we efficiently improve the performance of encoding and decoding algorithms. To evaluate the performance of HEGA-STORE, it is deployed in Netease to support log analysis. We compare HEGA-STORE with other database systems and the results show that HEGA-STORE can provide better performance for data import and query processing. It is a much compact encoding database for big data applications.

作者骆歆远陈刚伍赛

机构地区浙江大学计算机学院

出处《计算机研究与发展》 EI CSCD 北大核心 2015年第2期362-376,共15页 Journal of Computer Research and Development

基金国家科技支撑计划基金项目(2013BAG06B01) 国家"八六三"高技术研究发展计划基金项目(SS2013AA040601) 国家自然科学基金项目(61472348)

关键词数据库系统行列混合存储编码规则挖掘 GPU CUDA database system hybrid row-column storage encoding rule mining GPU CUDA

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献33

1Li J, Gao H, Luo J, et al. InfiniteDB: A PC-cluster based parallel massive database management system[C]//Proc of the 2007 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2007: 899-909.
2Gemawat S, Gobioff H, Shun- Tak L. The Google file system[C]//Proc of the 19th ACM Symp on Operating Systems Principles. New York: ACM, 2003: 29-43.
3Chang F, Dean J , Ghemawat S, et al. Bigtable , A distributed storage system for structured data[C]//Proc of the 7th Syrnp on Operating System Design and Implementation. Berkeley, CA: USE NIX Association, 2006: 205-218.
4Dean J, Ghemawat S. Maplceduce , Simplified data processing on large clusters[C]//Proc of the 6th Symp on Operating System Design and Implementation. Berkeley, CA: USE NIX Association, 2004: 10-23.
5Isard M, Yu v, Birrell A, et al. Dryad: Distributed data?parallel Programs from Sequential Building Blocks[R]. Redmond, WA: Microsoft Corporation, 2006.
6Meikel P, Dmitry P. Data compression in oracle[C]//Proc of the 29th Int Conf on Very Large Data Bases. San Francisco: Morgan Kaufmann, 200'3: 937-947.
7Westmann T, Kossmann D. The Implementation and Performance of Compressed Database[C]//Proc of the 2000 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2000: 55-67.
8MacNicol R, French B. Sybase IQ multiplex-designed for analytics[C]//Proc of the 30th Int Conf on Very Large Data Bases. San Francisco: Morgan Kaufmann, 2004: 1227-1230.
9Iyer B R, David W. Data compression support in databases[C]//Proc of the 20th Int Conf on Very Large Data Bases. San Francisco: Morgan Kaufmann, 1994: 695-704s.
10Paolo B, Rama N. DB2 for OS/390 and data compression[EB/OL]. (2006-11-20)[2007-06-03 ]. http://www. redbooks. ibm. com/redbooks/pdfs/sg245261. pdf.

同被引文献89

1金培权,杨濮源,陈恺萌,岳丽华.DBPower:面向绿色数据库系统的能耗有效性测试[J].计算机研究与发展,2011,48(S3):410-413. 被引量：7
2Chavez P S,Mackinnon D J. Automatic detection ofvegetation changes in the southwestern United States usingremotely sensed images [J]. ISPRS Journal ofPhotogrammetry and Remote Sensing, 1994, 60(5) : 1285-1294.
3Bruzzone L,Serpico S B. An iterative technique for thedetection of land-cover transitions in multispectral remotesensing images [J]. IEEE Trans on Geoscience and RemoteSensing, 1997, 35(4): 858-867.
4Yousif O, Ban Yifang, Improving SAR-based urban changedetection by combining MAP-MRF classifier and nonlocalmeans similarity weights [J]. IEEE Journal of SelectedTopics in Applied Earth Observation Remote Sensing, 2014,7(10): 4288-4300.
5Ban Yifang, Yousif O. Multitemporal spaceborne SAR datafor urban change detection in China [J]. IEEE Journal ofSelected Topics in Applied Earth Observation RemoteSensing, 2012, 5(4): 1087-1094.
6Hu Hongtao, Ban Yifang. Unsupervised change detection inmultitemporal SAR images over large urban areas [J]. IEEEJournal of Selected Topics in Applied Earth ObservationRemote Sensing, 2014, 7(8): 3248-3261.
7Hame T,Heiler I,Miguel-Ayanz J S. An unsupervisedchange detection and recognition system for forestry [J].International Journal of Remote Sensing, 1998 , 19 ( 6 ):1079-1099.
8Lee J S,Pottier E. Polarimetric Radar Imaging: From Basicsto Applications [M]. Boca Raton, FL: CRC Press* 2013.
9Bruzzone L,Prieto D F. An adaptive semiparametric andcontext-based approach to unsupervised change detection inmulti-temporal remote-sensing images [J]. IEEE Trans onImage Processing, 2002, 11(4) : 452-466.
10Bazi Y, Bruzzone L, Melgani F. An unsupervised approachbased on the generalized Gaussian model to automatic changedetection in multitemporal SAR images [J]. IEEE Trans onGeoscience and Remote Sensing* 2005 , 43(4) i 874-887.

引证文献7

1公茂果,苏临之,李豪,刘嘉.合成孔径雷达影像变化检测研究进展[J].计算机研究与发展,2016,53(1):123-137. 被引量：29
2杨沛.基于乒乓球运动员体能测试数据库系统的开发设计[J].自动化与仪器仪表,2017(1):50-51. 被引量：1
3孟学潮,叶少珍.基于实时数据和历史查询分布的时空索引新方法[J].计算机应用,2017,37(3):860-865. 被引量：2
4李海威.基于云计算的物联网数据网关的建设研究[J].计算机技术与发展,2018,28(1):188-190. 被引量：19
5韩金帅,于硕.基于CUDA的体数据可视化工具[J].电脑迷,2017(2):119-120.
6李仁刚,任智新,黄广奎,孙颉,王峰,张闯.面向数据库查询加速的异构体系结构设计与实现[J].计算机工程与科学,2020,42(12):2169-2178. 被引量：10
7屠要峰,陈河堆,王涵毅,闫宗帅,秦小麟,陈兵.面向GoldenX软硬协同优化的异构加速列式存储引擎研究[J].计算机学报,2022,45(1):207-223. 被引量：3

二级引证文献64

1崔斌,张永红,闫利,魏钜杰.高分三号SAR影像双阈值变化检测[J].遥感学报,2020,24(1):1-10. 被引量：6
2毛天祺,刘伟,黄洁,赵拥军.二进小波增强与边缘局部信息FCM的SAR图像变化检测[J].信号处理,2018,34(1):54-61. 被引量：13
3王建明,史文中,邵攀.自适应距离和模糊拓扑优化的模糊聚类SAR影像变化检测[J].测绘学报,2018,47(5):611-619. 被引量：19
4韩团军.基于物联网架构的引汉济渭调水工程环境监测系统设计[J].电子质量,2018(5):5-9. 被引量：2
5赵静,黄国满,赵争.多特征模糊融合的SAR影像变化检测[J].测绘科学,2018,43(7):115-120. 被引量：5
6王剑,王英华,刘宏伟,何敬鲁.基于深度卷积神经网络的PolSAR图像变化检测方法[J].系统工程与电子技术,2018,40(7):1457-1464. 被引量：2
7盛钢峰,范祥林,朱成棋,郁书好.婴幼儿智能手环健康检测系统设计[J].软件导刊,2018,17(7):147-149. 被引量：4
8张勇.基于ZIGBEE及TCP技术的物联网网关设计[J].电子制作,2018,26(20):12-13.
9陈峰.基于物联网的新型智慧园区应用研究与实现[J].数字通信世界,2019(4):191-192. 被引量：12
10蔡宣宣,张永红,崔斌.利用高分三号SAR影像进行双侧变化检测[J].遥感信息,2019,34(3):62-69. 被引量：3

1刘俊卿.超精湛A面15"宏碁V3-574G影娱本[J].计算机与网络,2015,41(16):26-27.
2itxiaobai .com.超精简Win7系统——Wny7[J].软件指南,2011(4):37-38.
3好好好.在线制作超精美个性化趣味照片[J].电脑知识与技术（经验技巧）,2009(1).
4江能兴,周淦淼.基于3DS MAX的三维模型的优化研究[J].计算机与数字工程,2012,40(4):136-139. 被引量：17
5李永亮.Oracle数据库中数据访问优化方法[J].科技视界,2015(15):66-66. 被引量：1
6侯相茹,盛芳圆.提高SQL语句查询效率的分析与探讨[J].中国科技博览,2009(34):271-271.
7裘文锋.超精致Win7 SP1系统，让老旧电脑运行如飞[J].软件指南,2012(6):41-42.
8周汉辉.高速、微细超精、重大型装备的激光干涉测量技术[J].CAD/CAM与制造业信息化,2009(4):72-74.
9汤清洪,王兴贵.Patran中采用数据导入的创建场方法[J].兵工自动化,2006,25(4):42-43. 被引量：3
10孙萧寒,蒋平江.关系型数据库系统查询优化分析[J].甘肃科技,2005,21(3):54-55. 被引量：5

计算机研究与发展

2015年第2期

浏览历史

内容加载中请稍等...

基于GPU加速的超精简型编码数据库系统被引量：7

参考文献33

同被引文献89

引证文献7

二级引证文献64

相关作者

相关机构

相关主题

浏览历史

基于GPU加速的超精简型编码数据库系统 被引量：7

参考文献33

同被引文献89

引证文献7

二级引证文献64

相关作者

相关机构

相关主题

浏览历史

基于GPU加速的超精简型编码数据库系统被引量：7