摘要
对于数据仓库中数据的物理存储组织,目前主要有关系和多维数组两种方式.这两种方式各有自己的优缺点,从提高联机分析处理(online analytical processing,简称OLAP)查询处理性能的角度出发,多维数组方式相对较优,目的主要是解决数据仓库的多维存储结构问题.针对当前多维数组存储组织方式存在的一些问题,提出了Cube(立方体)逻辑存储和物理存储的概念,首先将原多维数据空间划分为逻辑子空间,逻辑块再划分为多个物理块.在物理存储时充分考虑了多维数组的大容量和高稀疏度的问题,并采用新的多维数组的分布和压缩方法.这些概念和方法有效地解决了维内部层次结构的聚集操作和Cube操作的效率问题,显著提高了涉及维内部层次的聚集查询的响应速度,同时还解决了增量维护的效率问题.
As for physical data organization in data warehouse, there are mainly two kinds of methods, relational and multi-dimensional. These two methods have their own advantages and disadvantages, but as to improve the performance of OLAP (online analytical processing) query processing, the method of multi-dimensional array is superior. To solve the current problems in the method of multi-dimensional array, an improved multi-dimensional storage structure for data warehouse is proposed, and the concepts of logical storage and physical storage for data cube are given. According to this proposal, the original multi-dimensional data space is divided into many logical blocks, and a logical block is divided into many physical blocks. This multi-dimensional storage structure takes the characteristics of the large amount and highly sparse multi-dimensional array into consideration fully, and a new distributing and compressing method for the multi-dimensional array is adopted. These methods availably solve some efficiency problems of the aggregation query along with the inner level of the dimension and cube query, and dramatically improve the response time of the aggregation query. In particular, these methods also bring additional b9enefit for incremental maintenance of the multi-dimensional array.
出处
《软件学报》
EI
CSCD
北大核心
2002年第8期1423-1429,共7页
Journal of Software
基金
~~国家重点基础研究发展规划973资助项目(G1998030414)