摘要
针对高维、维度分层的大数据集,提出一种基于Map/Reduce框架的并行外壳片段立方体构建算法。算法采用Map/Reduce框架,实现外壳片段立方体的并行构建与查询。构建算法在Map过程中,计算出各个数据分块所有可能的数据单元或层次维编码前缀;在Reduce过程中,聚合计算得到最终的外壳片段和度量索引表。实验证明,并行外壳片段立方体算法一方面结合了Map/Reduce框架的并行性和高扩展性,另一方面结合了外壳片段立方体的压缩策略和倒排索引机制,能够有效避免高维数据物化时数据量的爆炸式增长,提供快速构建和查询操作。
In the high-dimensional and dimension hierarchical big data materializing, this paper proposes an efficient parallel shell fragments cube construction algorithm using Map/Reduce framework. The algorithm achieves parallel building and querying of shell fragments cube. For each data partition, map process of the construction algorithm calculates all possible data unit or prefix B encoding; Reduce process aggregates to calculate the ultimate shell fragments and measure index table. Experiments show that the parallel shell fragments cube algorithm not only combines the parallelism and scalability of Map/Reduce framework, but also combines the compression strategy and inverted index structure of shell fragments cube. The parallel shell fragments cube algorithm can effectively avoid the explosion of data volumes while materializing high-dimensional data, and provides the quick build and query operations.
出处
《计算机工程与应用》
CSCD
北大核心
2015年第22期124-129,共6页
Computer Engineering and Applications
基金
水利部公益性行业科研专项(No.201501022)