Responding to complex analytical queries in the data warehouse(DW)is one of the most challenging tasks that require prompt attention.The problem of materialized view(MV)selection relies on selecting the most optimal v...Responding to complex analytical queries in the data warehouse(DW)is one of the most challenging tasks that require prompt attention.The problem of materialized view(MV)selection relies on selecting the most optimal views that can respond to more queries simultaneously.This work introduces a combined approach in which the constraint handling process is combined with metaheuristics to select the most optimal subset of DW views from DWs.The proposed work initially refines the solution to enable a feasible selection of views using the ensemble constraint handling technique(ECHT).The constraints such as self-adaptive penalty,epsilon(ε)-parameter and stochastic ranking(SR)are considered for constraint handling.These two constraints helped the proposed model select the finest views that minimize the objective function.Further,a novel and effective combination of Ebola and coot optimization algorithms named hybrid Ebola with coot optimization(CHECO)is introduced to choose the optimal MVs.Ebola and Coot have recently introduced metaheuristics that identify the global optimal set of views from the given population.By combining these two algorithms,the proposed framework resulted in a highly optimized set of views with minimized costs.Several cost functions are described to enable the algorithm to choose the finest solution from the problem space.Finally,extensive evaluations are conducted to prove the performance of the proposed approach compared to existing algorithms.The proposed framework resulted in a view maintenance cost of 6,329,354,613,784,query processing cost of 3,522,857,483,566 and execution time of 226 s when analyzed using the TPC-H benchmark dataset.展开更多
The data warehouse is the most widely used database structure in many decision support systems around the world. This is the reason why a lot of research has been conducted in the literature over the last two decades ...The data warehouse is the most widely used database structure in many decision support systems around the world. This is the reason why a lot of research has been conducted in the literature over the last two decades on their design, refreshment and optimization. The manipulation of hypercubes (cubes) of data is a frequently used operation in the design of multidimensional data warehouses, due to their better adaptation to OLAP (On-Line Analytical Processing). However, the updating of these hypercubes is a very complicated process due mainly to the mass and complexity of the data presented. The purpose of this paper is to present the state of the art of works based on multidimensional modeling using the hypercube as a unit of presentation of data stores. It starts with the base of this process which is the choice of the views (cubes) forming our data warehouse base. The objective of this work is to describe the state of the art of research works dealing with the selection of materialized views in decision support systems.展开更多
通过基于主动决策引擎日志的数据挖掘来找到分析规则的CUBE使用模式,从而为多维数据实视图选择算法提供重要依据;在此基础上设计了3A概率模型,并给出考虑CUBE受访概率分布的视图选择贪婪算法PGreedy(probability greedy),以及结合视图...通过基于主动决策引擎日志的数据挖掘来找到分析规则的CUBE使用模式,从而为多维数据实视图选择算法提供重要依据;在此基础上设计了3A概率模型,并给出考虑CUBE受访概率分布的视图选择贪婪算法PGreedy(probability greedy),以及结合视图挽留原则的视图动态调整算法.实验结果表明,在实时主动数据仓库环境下,PGreedy算法比BPUS(benefit per unit space)算法具有更好的性能.展开更多
针对传统的物化视图选择(materialized view selection,MVS)算法评价指标单一(仅评价物化时间,过度追求物化视图的查询命中率)会导致超高维度时的维度灾难以及物化视图集频繁抖动的问题,本文提出了一种基于带权图的多维大数据模型优化算...针对传统的物化视图选择(materialized view selection,MVS)算法评价指标单一(仅评价物化时间,过度追求物化视图的查询命中率)会导致超高维度时的维度灾难以及物化视图集频繁抖动的问题,本文提出了一种基于带权图的多维大数据模型优化算法(multi-dimensional big data model optimization,MMO),通过引入平均查询时延和膨胀率评价指标,基于带权图模型找出物化视图集的最优解。实验结果表明,本文算法在综合评分、平均查询时延、膨胀率方面均优于粒子群算法(particle swarm optimization,PSO),解决了超高维数据下的维度灾难问题,并且能够快速收敛。展开更多
文摘Responding to complex analytical queries in the data warehouse(DW)is one of the most challenging tasks that require prompt attention.The problem of materialized view(MV)selection relies on selecting the most optimal views that can respond to more queries simultaneously.This work introduces a combined approach in which the constraint handling process is combined with metaheuristics to select the most optimal subset of DW views from DWs.The proposed work initially refines the solution to enable a feasible selection of views using the ensemble constraint handling technique(ECHT).The constraints such as self-adaptive penalty,epsilon(ε)-parameter and stochastic ranking(SR)are considered for constraint handling.These two constraints helped the proposed model select the finest views that minimize the objective function.Further,a novel and effective combination of Ebola and coot optimization algorithms named hybrid Ebola with coot optimization(CHECO)is introduced to choose the optimal MVs.Ebola and Coot have recently introduced metaheuristics that identify the global optimal set of views from the given population.By combining these two algorithms,the proposed framework resulted in a highly optimized set of views with minimized costs.Several cost functions are described to enable the algorithm to choose the finest solution from the problem space.Finally,extensive evaluations are conducted to prove the performance of the proposed approach compared to existing algorithms.The proposed framework resulted in a view maintenance cost of 6,329,354,613,784,query processing cost of 3,522,857,483,566 and execution time of 226 s when analyzed using the TPC-H benchmark dataset.
文摘The data warehouse is the most widely used database structure in many decision support systems around the world. This is the reason why a lot of research has been conducted in the literature over the last two decades on their design, refreshment and optimization. The manipulation of hypercubes (cubes) of data is a frequently used operation in the design of multidimensional data warehouses, due to their better adaptation to OLAP (On-Line Analytical Processing). However, the updating of these hypercubes is a very complicated process due mainly to the mass and complexity of the data presented. The purpose of this paper is to present the state of the art of works based on multidimensional modeling using the hypercube as a unit of presentation of data stores. It starts with the base of this process which is the choice of the views (cubes) forming our data warehouse base. The objective of this work is to describe the state of the art of research works dealing with the selection of materialized views in decision support systems.
基金Supported by the National Natural Science Foundation of China under Grant No.60473051 (国家自然科学基金) the China HP Co. and Peking University Joint Project (北京大学-惠普(中国)合作项目)
文摘通过基于主动决策引擎日志的数据挖掘来找到分析规则的CUBE使用模式,从而为多维数据实视图选择算法提供重要依据;在此基础上设计了3A概率模型,并给出考虑CUBE受访概率分布的视图选择贪婪算法PGreedy(probability greedy),以及结合视图挽留原则的视图动态调整算法.实验结果表明,在实时主动数据仓库环境下,PGreedy算法比BPUS(benefit per unit space)算法具有更好的性能.
文摘针对传统的物化视图选择(materialized view selection,MVS)算法评价指标单一(仅评价物化时间,过度追求物化视图的查询命中率)会导致超高维度时的维度灾难以及物化视图集频繁抖动的问题,本文提出了一种基于带权图的多维大数据模型优化算法(multi-dimensional big data model optimization,MMO),通过引入平均查询时延和膨胀率评价指标,基于带权图模型找出物化视图集的最优解。实验结果表明,本文算法在综合评分、平均查询时延、膨胀率方面均优于粒子群算法(particle swarm optimization,PSO),解决了超高维数据下的维度灾难问题,并且能够快速收敛。