摘要
HBase列式数据库的所有操作均以追加数据方式写入,导致其合并机制占用资源过多,影响系统读性能。为解决该问题,提出一种基于数据冗余的合并机制,将列族下文件删除数据占比达到设定阈值的文件进行合并,以减少无用数据在系统中的占用空间。实验结果表明,与HBase原有仅考虑文件大小、个数和时间间隔的合并机制相比,改进的合并机制可提高HBase系统查询效率以及Major合并性能。
In HBase,the operations are written to database in the form of appending data.HBase Compaction mechanisms occupy plenty of system resources,which affects read performance.To solve this problem,a mechanism based on data redundancy is proposed.By compacting the column files whose ratio of deleted data equals the threshold,the algorithm can reduce space occupation because it reduces the number of files while cleaning useless data.Experimental result indicates,compared with the original HBase Compaction mechanism,which only considers the size and number of files and time interval,the proposed Compaction mechanism can improve HBase system query efficiency and enhance HBase Major compaction capability.
出处
《计算机工程》
CAS
CSCD
北大核心
2017年第2期63-67,共5页
Computer Engineering
基金
重庆市教委科学技术研究项目(KJ1400414)
重庆邮电大学博士启动基金(A2015-17)
重庆邮电大学自然科学基金(A2011-29)