On-Line Analytical Processing (OLAP) is based on pre-computation of data cubes, which greatly reduces the response time and improves the performance of OLAP. Frag-Shells algorithm is a common method of precomputation....On-Line Analytical Processing (OLAP) is based on pre-computation of data cubes, which greatly reduces the response time and improves the performance of OLAP. Frag-Shells algorithm is a common method of precomputation.However, it relies too much on the data dispersion that it performs poorly, when confronts large amount of highly disperse data. As the amount of data grows fast nowadays, the efficiency of data cube construction is increasingly becoming a significant bottleneck. In addition, with the popularity of cloud computing and big data, MapReduce framework proposed by Google is playing an increasingly prominent role in parallel processing. It is an intuitive idea that MapReduce framework can be used to enhance the efficiency of parallel data cube construction. In this paper, by improving the Frag-Shells algorithm based on the irrelevance of data dispersion, and taking advantages of the high parallelism of MapReduce framework, we propose an improved Frag-Shells algorithm based on MapReduce framework. The simulation results prove that the proposed algorithm greatly enhances the efficiency of cube construction.展开更多
文摘On-Line Analytical Processing (OLAP) is based on pre-computation of data cubes, which greatly reduces the response time and improves the performance of OLAP. Frag-Shells algorithm is a common method of precomputation.However, it relies too much on the data dispersion that it performs poorly, when confronts large amount of highly disperse data. As the amount of data grows fast nowadays, the efficiency of data cube construction is increasingly becoming a significant bottleneck. In addition, with the popularity of cloud computing and big data, MapReduce framework proposed by Google is playing an increasingly prominent role in parallel processing. It is an intuitive idea that MapReduce framework can be used to enhance the efficiency of parallel data cube construction. In this paper, by improving the Frag-Shells algorithm based on the irrelevance of data dispersion, and taking advantages of the high parallelism of MapReduce framework, we propose an improved Frag-Shells algorithm based on MapReduce framework. The simulation results prove that the proposed algorithm greatly enhances the efficiency of cube construction.