期刊文献+

分布式系统下大数据存储结构优化研究 被引量:7

Research on optimizing big data storage structure in distributed system
下载PDF
导出
摘要 在分布式系统中,数据的存储结构直接影响了大数据的存储效率和处理性能。在行式存储结构下,数据从本地读取,加载速度快,但压缩效率低且存在数据冗余;在列式存储结构下,数据压缩效率高,但数据的跨节点访问增加了网络传输消耗。针对行式存储结构和列式存储结构的缺点,提出一种以行列结合的存储方式,对数据存储结构进行改进。实验结果表明,改进的数据存储结构在加载速度上略低于行式存储;在数据压缩上,比行式存储和列式存储的效率都高。行列结合的存储结构不仅避免行式存储的额外磁盘I/O开销,同时也减少了列式存储不必要的网络传输,极大地提高分布式系统对大数据存储效率及处理性能。 In a distributed system, the data storage structure directly affects the storage efficiency and processing performance of big data. In the row store structure, the data is loaded locally and the speed is fast, but it also loads additional columns, and it's hard to compress. The column store structure has high compression efficiency, but it has additional network transferring overhead. To overcome their storages and improve the data storage structure, this paper presents a new data storage structure combining row and column. The experiment result shows that it' s inferior a little in data loading to the row store structure, and it has high compression efficiency comparing with the row store structure and column store structure. It not only avoids additional disk I/O, but also cuts down the unnecessary network transfer time in column store. So, the row - column store can greatly improve big data storage and processing performance in distributed system.
出处 《河北工程大学学报(自然科学版)》 CAS 2014年第4期69-73,共5页 Journal of Hebei University of Engineering:Natural Science Edition
关键词 大数据 分布式 行列存储 big data distributed system row- column store
  • 相关文献

同被引文献86

引证文献7

二级引证文献39

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部