期刊文献+

大数据环境下数据读取关键技术研究 被引量:4

Research on Data Reading Techniques Based on Big Data Environment
下载PDF
导出
摘要 针对大数据环境下数据读取面临的主要挑战,文中重点研究了分布式文件系统中数据读取关键技术。根据数据存放结构的不同,从数据加载、查询处理和存储空间利用三个方面分析比较行存储、列存储和行列混合存储的优缺点和面临的挑战,重点介绍列存储中涉及到的压缩和物化技术,具体分析了存储压缩中经常运用的行程编码算法、词典编码算法、位向量编码算法和元组重构中运用的延迟物化技术。通过分析现有技术存在的问题,探讨相关的解决方案,并展望了未来研究的发展方向。 Under the big data environment, data reading has faced enormous challenges. In this paper ,focus on the key technologies of data in the distributed file system. Analyze the row-storage, column-storage, hybrid-storage according to data placement structure from data loading, query processing and storage space utilization. Besides, it introduces materialization techniques used in column-storage including run-length encoding, dictionary encoding ,bit-vector encoding and lazy decompression. Meanwhile, by analysis of the present problem, discuss the relative solutions, and has a prospect of future development.
出处 《计算机技术与发展》 2015年第2期113-116,共4页 Computer Technology and Development
基金 国家自然科学基金资助项目(60973140 61170276 61373135) 江苏省产学研项目(BY2013011) 江苏省科技型企业创新基金项目(BC2013027) 江苏省高校自然科学研究重大项目(12KJA520003)
关键词 大数据 列存储 压缩 物化技术 big data column - storage compression materialization techniques
  • 相关文献

参考文献17

  • 1王意洁,孙伟东,周松,裴晓强,李小勇.云计算环境下的分布存储关键技术[J].软件学报,2012,23(4):962-986. 被引量:280
  • 2覃雄派,王会举,杜小勇,王珊.大数据分析——RDBMS与MapReduce的竞争与共生[J].软件学报,2012,23(1):32-45. 被引量:386
  • 3宫学庆,金澈清,王晓玲,张蓉,周傲英.数据密集型科学与工程:需求和挑战[J].计算机学报,2012,35(8):1563-1578. 被引量:79
  • 4He Yongqiang, Lee Rubao, Huai Yin, et al. RCFile : a fast and space-efficient data placement structure in MapReduce-based warehouse systems [ C ]//Proc of 2011 IEEE 27th internation-al conference on data engineering. Hannover: IEEE, 2011: 1199-1208.
  • 5Yao Leiyue, Chen Yong. An optimizing strategy for massive data management system based on SQLSERVER 2000 [ C ]// Proc of Asia-Pacific conference on information processing. [s. 1. ]:[s. n. ] ,2009:18-19.
  • 6Abadi D J,Madden S R,Hachem N. Column-stores vs. row- stores:how different are they really? [ C ]//Proceedings of the 2008 ACM SIGMOD international conference management of data. Vancouver, Canada: ACM,2008:967-980.
  • 7Ding Xiangwu, Yu Wenbing, Le Jiajin. An adaptive projection strategy and its implementation in column stores[ C ]//Proc of 6th IEEE joint international conference on information tech- nology and artificial intelligence. [ s. 1. ] :IEEE ,2011:20-22.
  • 8Amin A, Qureshi H A, Junaid M, et al. Modified run length en- coding scheme with introduction of bit stuffing for efficient da- ta compression [ C ]//Proc of 6th international conference on interuet technology and secured transactions. [ s. 1. ] : [ s. n. ] ,2011:668-672.
  • 9Stahno M, Wrembel R. RLH:bitmap compression technique based on run-length and Huffman encoding [ J ]. Information System,2009,34 (4-5) :400-414.
  • 10Urbani J, Maassen J, Drost N, et al. Scalable RDF data com- pression with MapReduce [ J]. Concurrency and Computation: Practice & Experience ,2013,25 ( 1 ) :24-39.

二级参考文献105

  • 1Stratos Idreos et al.Self-organizing tuple reconstruction in column-stores//Proceedings of the SIGMOD.Providence,Rhode Island,USA,2009:297-308.
  • 2Huffman D.A method for the construction of minimum-redundancy codes.IEEE Transactions on Information Theory,1952,9(40):1098-1101.
  • 3Witten I H,Neal R,Cleary J.Arithmetic coding for data compression.Communications of the ACM,1987,30(6):520-540.
  • 4Roth M A,Van Horn S J.Database compression.ACM SIGMOD Record,1993,22(3):31-39.
  • 5Tanaka H,Leon-Garcia A.Efficient run-length encodings.IEEE Transactions on Information Theory,1982,6(28):880-890.
  • 6Ziv J,Lempl A.A universal algorithm for sequential data compression.Proceedings of the IEEE Transactions on Information Theory,1977,22(1):337-343.
  • 7Abadi D J et al.Query execution in column-oriented database systems[Ph.D.dissertation].Cambridge,Massachusetts:Department of Electrical Engineering and Computer Science,Massachusetts Institute of Technology,2008.
  • 8Trondheim,Norway,Mike Stonebraker,Abadi D J et al.C-store-A column oriented DBMS//Proceedings of the 31st VLDB Conference.Trondheim,Norway,2005:553-564.
  • 9Weyla S,Friesb J,Wiederholdc G,Germano F.A modular self-describing clinical databank system.Computers and Biomedical Research,1975,8(3):279-293.
  • 10Wong H K T et al.Bit transposed files//Proceedings of the 11th International Conference on Very Large Data Bases Stockholm.Sweden,1985:448-457.

共引文献736

同被引文献32

引证文献4

二级引证文献47

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部