Responding to complex analytical queries in the data warehouse(DW)is one of the most challenging tasks that require prompt attention.The problem of materialized view(MV)selection relies on selecting the most optimal v...Responding to complex analytical queries in the data warehouse(DW)is one of the most challenging tasks that require prompt attention.The problem of materialized view(MV)selection relies on selecting the most optimal views that can respond to more queries simultaneously.This work introduces a combined approach in which the constraint handling process is combined with metaheuristics to select the most optimal subset of DW views from DWs.The proposed work initially refines the solution to enable a feasible selection of views using the ensemble constraint handling technique(ECHT).The constraints such as self-adaptive penalty,epsilon(ε)-parameter and stochastic ranking(SR)are considered for constraint handling.These two constraints helped the proposed model select the finest views that minimize the objective function.Further,a novel and effective combination of Ebola and coot optimization algorithms named hybrid Ebola with coot optimization(CHECO)is introduced to choose the optimal MVs.Ebola and Coot have recently introduced metaheuristics that identify the global optimal set of views from the given population.By combining these two algorithms,the proposed framework resulted in a highly optimized set of views with minimized costs.Several cost functions are described to enable the algorithm to choose the finest solution from the problem space.Finally,extensive evaluations are conducted to prove the performance of the proposed approach compared to existing algorithms.The proposed framework resulted in a view maintenance cost of 6,329,354,613,784,query processing cost of 3,522,857,483,566 and execution time of 226 s when analyzed using the TPC-H benchmark dataset.展开更多
To efficiently solve the materialized view selection problem, an optimal genetic algorithm of how to select a set of views to be materialized is proposed so as to achieve both good query performance and low view maint...To efficiently solve the materialized view selection problem, an optimal genetic algorithm of how to select a set of views to be materialized is proposed so as to achieve both good query performance and low view maintenance cost under a storage space constraint. First, a pre-processing algorithm based on the maximum benefit per unit space is used to generate initial solutions. Then, the initial solutions are improved by the genetic algorithm having the mixture of optimal strategies. Furthermore, the generated infeasible solutions during the evolution process are repaired by loss function. The experimental results show that the proposed algorithm outperforms the heuristic algorithm and canonical genetic algorithm in finding optimal solutions.展开更多
The data warehouse is the most widely used database structure in many decision support systems around the world. This is the reason why a lot of research has been conducted in the literature over the last two decades ...The data warehouse is the most widely used database structure in many decision support systems around the world. This is the reason why a lot of research has been conducted in the literature over the last two decades on their design, refreshment and optimization. The manipulation of hypercubes (cubes) of data is a frequently used operation in the design of multidimensional data warehouses, due to their better adaptation to OLAP (On-Line Analytical Processing). However, the updating of these hypercubes is a very complicated process due mainly to the mass and complexity of the data presented. The purpose of this paper is to present the state of the art of works based on multidimensional modeling using the hypercube as a unit of presentation of data stores. It starts with the base of this process which is the choice of the views (cubes) forming our data warehouse base. The objective of this work is to describe the state of the art of research works dealing with the selection of materialized views in decision support systems.展开更多
通过基于主动决策引擎日志的数据挖掘来找到分析规则的CUBE使用模式,从而为多维数据实视图选择算法提供重要依据;在此基础上设计了3A概率模型,并给出考虑CUBE受访概率分布的视图选择贪婪算法PGreedy(probability greedy),以及结合视图...通过基于主动决策引擎日志的数据挖掘来找到分析规则的CUBE使用模式,从而为多维数据实视图选择算法提供重要依据;在此基础上设计了3A概率模型,并给出考虑CUBE受访概率分布的视图选择贪婪算法PGreedy(probability greedy),以及结合视图挽留原则的视图动态调整算法.实验结果表明,在实时主动数据仓库环境下,PGreedy算法比BPUS(benefit per unit space)算法具有更好的性能.展开更多
文摘Responding to complex analytical queries in the data warehouse(DW)is one of the most challenging tasks that require prompt attention.The problem of materialized view(MV)selection relies on selecting the most optimal views that can respond to more queries simultaneously.This work introduces a combined approach in which the constraint handling process is combined with metaheuristics to select the most optimal subset of DW views from DWs.The proposed work initially refines the solution to enable a feasible selection of views using the ensemble constraint handling technique(ECHT).The constraints such as self-adaptive penalty,epsilon(ε)-parameter and stochastic ranking(SR)are considered for constraint handling.These two constraints helped the proposed model select the finest views that minimize the objective function.Further,a novel and effective combination of Ebola and coot optimization algorithms named hybrid Ebola with coot optimization(CHECO)is introduced to choose the optimal MVs.Ebola and Coot have recently introduced metaheuristics that identify the global optimal set of views from the given population.By combining these two algorithms,the proposed framework resulted in a highly optimized set of views with minimized costs.Several cost functions are described to enable the algorithm to choose the finest solution from the problem space.Finally,extensive evaluations are conducted to prove the performance of the proposed approach compared to existing algorithms.The proposed framework resulted in a view maintenance cost of 6,329,354,613,784,query processing cost of 3,522,857,483,566 and execution time of 226 s when analyzed using the TPC-H benchmark dataset.
文摘To efficiently solve the materialized view selection problem, an optimal genetic algorithm of how to select a set of views to be materialized is proposed so as to achieve both good query performance and low view maintenance cost under a storage space constraint. First, a pre-processing algorithm based on the maximum benefit per unit space is used to generate initial solutions. Then, the initial solutions are improved by the genetic algorithm having the mixture of optimal strategies. Furthermore, the generated infeasible solutions during the evolution process are repaired by loss function. The experimental results show that the proposed algorithm outperforms the heuristic algorithm and canonical genetic algorithm in finding optimal solutions.
文摘The data warehouse is the most widely used database structure in many decision support systems around the world. This is the reason why a lot of research has been conducted in the literature over the last two decades on their design, refreshment and optimization. The manipulation of hypercubes (cubes) of data is a frequently used operation in the design of multidimensional data warehouses, due to their better adaptation to OLAP (On-Line Analytical Processing). However, the updating of these hypercubes is a very complicated process due mainly to the mass and complexity of the data presented. The purpose of this paper is to present the state of the art of works based on multidimensional modeling using the hypercube as a unit of presentation of data stores. It starts with the base of this process which is the choice of the views (cubes) forming our data warehouse base. The objective of this work is to describe the state of the art of research works dealing with the selection of materialized views in decision support systems.
基金Supported by the National Natural Science Foundation of China under Grant No.60473051 (国家自然科学基金) the China HP Co. and Peking University Joint Project (北京大学-惠普(中国)合作项目)
文摘通过基于主动决策引擎日志的数据挖掘来找到分析规则的CUBE使用模式,从而为多维数据实视图选择算法提供重要依据;在此基础上设计了3A概率模型,并给出考虑CUBE受访概率分布的视图选择贪婪算法PGreedy(probability greedy),以及结合视图挽留原则的视图动态调整算法.实验结果表明,在实时主动数据仓库环境下,PGreedy算法比BPUS(benefit per unit space)算法具有更好的性能.