摘要
封闭立方体是一种非常有效而重要的数据立方体压缩技术,目前还缺乏对其并行算法的研究.为此,文中提出一种采用C-Cubing方法并通过MapReduce并行模型进行并行化的新方法.该方法首先在Map过程中对各个数据分块计算出数据单元的代表元组和封闭掩码,然后在Reduce过程中进行聚合以获得封闭单元.实验结果表明,文中方法能有效地提高在大数据集上计算封闭立方体的速度.
Although the closed cube is a high-efficiency and important technology for data cube compression, there is no research on its parallel algorithm at present. In this paper, a novel parallel approach combining the C-Cubing technology with the MapReduce framework is proposed. In this approach, the representative tuple and closed mask of each data cell for every data block are computed in the Map process, and the closed cells are obtained by the aggregation in the Reduce process. Experimental results show that the proposed approach greatly increases the computation speed of closed cubes in large-scale datasets.
出处
《华南理工大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2009年第1期91-95,112,共6页
Journal of South China University of Technology(Natural Science Edition)
基金
广东省科技计划项目(2004A10205003
2006B11301001)
广州市科技计划项目(2006Z3-D3081)