摘要
为了提高Hive的查询性能,对HDFS数据块放置策略进行了优化,提出了一种基于相关关系分析的数据块放置策略,通过构建并发关系矩阵和相交关系矩阵评估待放置数据块与节点上已放置数据块之间的相关关系,并综合目标数据块的访问频率,选择合适的节点存储。
In order to improve the query performance of Hive,the HDFS data block placement strategy was optimized,and a data block placement strategy based on correlation analysis was proposed.The correlation between the data block to be placed and the data block placed on the node was evaluated by constructing the concurrency relationship matrix and intersection relationship matrix,and the access frequency of the target data block was integrated to select the appropriate node storage.
作者
荆忠航
张伟
王佳慧
马利民
徐涛
JING Zhonghang;ZHANG Wei;WANG Jiahui;MA Limin;XU Tao(Computer School,Beijing Information Science&Technology University,Beijing 100101,China;Beijing Advanced Innovation Center for Materials Genome Engineering,Beijing Information Science&Technology University,Beijing 100101,China;Information and Network Security Department,National Information Center,Beijing 100045,China;Research Center for Microprocessor and System-on-Chip Technology,Tsinghua University,Beijing 100084,China)
出处
《北京信息科技大学学报(自然科学版)》
2021年第6期93-100,共8页
Journal of Beijing Information Science and Technology University
基金
北京材料基因工程高精尖创新中心项目。