期刊文献+

基于GPU加速的超精简型编码数据库系统 被引量:7

A GPU-Accelerated Highly Compact and Encoding Based Database System
下载PDF
导出
摘要 在数据爆发式增长的今天,特别是通信、金融、互联网等领域产生的大规模数据,在存储和查询方面给业界带来了前所未有的压力.在这种背景下,当前的数据库和数据仓库系统通过对数据进行压缩编码,在节约空间的同时减少了数据表查询时所需的I/O,获得性能上的提升,但大部分系统在面对实际大规模企业数据应用时依然无法在压缩比、导入时间或查询性能上完全满足企业需求.通过基于一定的规则对数据重新进行编码和精简,实现了一种新型超精简型编码的数据库系统HEGA-STORE.采用行列混合存储的架构;提出基于列内和列间规则挖掘和编码的数据导入存储计划;同时在规则挖掘和编码中使用GPU作为协处理器并行处理算法从而提高效率.通过开发编解码原型系统,对大规模网易易信通信记录数据和网易后台日志数据的导入和查询分别进行了测试,并与其他压缩编码算法和数据库、数据仓库产品进行比较.对比实验结果表明,相比同类数据库和数据仓库产品,原型系统拥有极高的压缩比,并且在导入速度和全表扫描查询速度也处于领先地位,同时使用GPU和CPU协作进行数据处理时也能进一步提高系统性能,验证了提出的超精简型编码数据库系统的实际应用价值. In the big data era, business applications generate huge volumes of data, making it extremely challenging to store and manage those data. One possible solution adopted in previous database systems is to employ some types of encoding techniques, which can effectively reduce the size of data and consequential improve the query performance. However, existing encoding approaches still cannot make a good tradeof{ between the compression ratio, importing time and query performance. In this paper, to address the problem, we propose a new encoding-based database system, HEGA-STORE, which adopts the hybrid row-oriented and column-oriented storage model. In HEGA-STORE, we design a GPU-assistant encoding scheme by combining the rule-based encoding and conventional compression algorithms. By exploiting the computation power of GPU, we efficiently improve the performance of encoding and decoding algorithms. To evaluate the performance of HEGA-STORE, it is deployed in Netease to support log analysis. We compare HEGA-STORE with other database systems and the results show that HEGA-STORE can provide better performance for data import and query processing. It is a much compact encoding database for big data applications.
出处 《计算机研究与发展》 EI CSCD 北大核心 2015年第2期362-376,共15页 Journal of Computer Research and Development
基金 国家科技支撑计划基金项目(2013BAG06B01) 国家"八六三"高技术研究发展计划基金项目(SS2013AA040601) 国家自然科学基金项目(61472348)
关键词 数据库系统 行列混合存储 编码 规则挖掘 GPU CUDA database system hybrid row-column storage encoding rule mining GPU CUDA
  • 相关文献

参考文献33

  • 1Li J, Gao H, Luo J, et al. InfiniteDB: A PC-cluster based parallel massive database management system[C]//Proc of the 2007 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2007: 899-909.
  • 2Gemawat S, Gobioff H, Shun- Tak L. The Google file system[C]//Proc of the 19th ACM Symp on Operating Systems Principles. New York: ACM, 2003: 29-43.
  • 3Chang F, Dean J , Ghemawat S, et al. Bigtable , A distributed storage system for structured data[C]//Proc of the 7th Syrnp on Operating System Design and Implementation. Berkeley, CA: USE NIX Association, 2006: 205-218.
  • 4Dean J, Ghemawat S. Maplceduce , Simplified data processing on large clusters[C]//Proc of the 6th Symp on Operating System Design and Implementation. Berkeley, CA: USE NIX Association, 2004: 10-23.
  • 5Isard M, Yu v, Birrell A, et al. Dryad: Distributed data?parallel Programs from Sequential Building Blocks[R]. Redmond, WA: Microsoft Corporation, 2006.
  • 6Meikel P, Dmitry P. Data compression in oracle[C]//Proc of the 29th Int Conf on Very Large Data Bases. San Francisco: Morgan Kaufmann, 200'3: 937-947.
  • 7Westmann T, Kossmann D. The Implementation and Performance of Compressed Database[C]//Proc of the 2000 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2000: 55-67.
  • 8MacNicol R, French B. Sybase IQ multiplex-designed for analytics[C]//Proc of the 30th Int Conf on Very Large Data Bases. San Francisco: Morgan Kaufmann, 2004: 1227-1230.
  • 9Iyer B R, David W. Data compression support in databases[C]//Proc of the 20th Int Conf on Very Large Data Bases. San Francisco: Morgan Kaufmann, 1994: 695-704s.
  • 10Paolo B, Rama N. DB2 for OS/390 and data compression[EB/OL]. (2006-11-20)[2007-06-03 ]. http://www. redbooks. ibm. com/redbooks/pdfs/sg245261. pdf.

同被引文献89

  • 1金培权,杨濮源,陈恺萌,岳丽华.DBPower:面向绿色数据库系统的能耗有效性测试[J].计算机研究与发展,2011,48(S3):410-413. 被引量:7
  • 2Chavez P S,Mackinnon D J. Automatic detection ofvegetation changes in the southwestern United States usingremotely sensed images [J]. ISPRS Journal ofPhotogrammetry and Remote Sensing, 1994, 60(5) : 1285-1294.
  • 3Bruzzone L,Serpico S B. An iterative technique for thedetection of land-cover transitions in multispectral remotesensing images [J]. IEEE Trans on Geoscience and RemoteSensing, 1997, 35(4): 858-867.
  • 4Yousif O, Ban Yifang, Improving SAR-based urban changedetection by combining MAP-MRF classifier and nonlocalmeans similarity weights [J]. IEEE Journal of SelectedTopics in Applied Earth Observation Remote Sensing, 2014,7(10): 4288-4300.
  • 5Ban Yifang, Yousif O. Multitemporal spaceborne SAR datafor urban change detection in China [J]. IEEE Journal ofSelected Topics in Applied Earth Observation RemoteSensing, 2012, 5(4): 1087-1094.
  • 6Hu Hongtao, Ban Yifang. Unsupervised change detection inmultitemporal SAR images over large urban areas [J]. IEEEJournal of Selected Topics in Applied Earth ObservationRemote Sensing, 2014, 7(8): 3248-3261.
  • 7Hame T,Heiler I,Miguel-Ayanz J S. An unsupervisedchange detection and recognition system for forestry [J].International Journal of Remote Sensing, 1998 , 19 ( 6 ):1079-1099.
  • 8Lee J S,Pottier E. Polarimetric Radar Imaging: From Basicsto Applications [M]. Boca Raton, FL: CRC Press* 2013.
  • 9Bruzzone L,Prieto D F. An adaptive semiparametric andcontext-based approach to unsupervised change detection inmulti-temporal remote-sensing images [J]. IEEE Trans onImage Processing, 2002, 11(4) : 452-466.
  • 10Bazi Y, Bruzzone L, Melgani F. An unsupervised approachbased on the generalized Gaussian model to automatic changedetection in multitemporal SAR images [J]. IEEE Trans onGeoscience and Remote Sensing* 2005 , 43(4) i 874-887.

引证文献7

二级引证文献64

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部