摘要
研究产品相关大数据资源组织存储与检索查询技术,提出在Hadoop平台基础上对产品大数据资源进行分块存储。基于MapReduce并行架构模型,提出多副本一致性Hash数据存储算法,算法充分考虑了数据的相关性和时空属性,并优化了Hadoop平台的数据划分策略和数据块规格调整。通过对数据的优化存储布局,采用多源并行连接检索方法和多通道数据融合特征提取技术实现产品大数据信息检索,提高了数据资源管理效率。实验表明和标准Hadoop方案比较,多源并行连接数据检索的执行时间为其31.9%。
A blocking storage layout optimization method based on Hadoop was proposed. A multi-copy consistency hash algorithm based on data correlation and spatial and temporal properties was used. Data distribution strategy and block size adjustment were studied based on Hadoop. A multi-data source map join query algorithm and a multichannel data fusion feature extraction algorithm based on data-optimised storage were designed for the big data resources of products according to the MapReduce parallel framework. Practical verifications show that the execution time of multi-data source parallel retrieval was only 31.9% of the time of the standard Hadoop scheme.
出处
《计算机科学与应用》
2021年第5期1503-1511,共9页
Computer Science and Application