摘要
OLAP系统中经常要在大规模数据库上进行复杂查询 为了提高查询响应速度 ,往往要事先物化一些视图 在考虑选择物化哪些视图时 ,必须首先解决视图大小的估算问题 目前 ,对于视图大小的估算 ,主要有两种方法 :一种是利用概率模型和数学估算的方法 ;另一种是假定数据符合某种特定的分布模型 通过采样确定模型的参数 ,并将其推广到整个数据集进行估算 提出了一种视图估算的新方法FSC ,引入了频繁项集挖掘的思想 ,在扫描两次数据库后可以得到cube中所有视图大小的估算值 实验证明 ,与同类算法相比 ,FSC的精度有较大地提高 。
On line analytical processing (OLAP) usually involves complex queries on very large database Pre aggregation is frequently used to speed up the query response time Storage estimation should be done in advance for selective pre aggregation The solutions of the problem boil down to two categories: one is based on probabilistic counting and mathematical approximation The other one based on a priori distribution model is to extrapolate the estimated parameters of distribution on sampling subset to the whole dataset A novel approach named FSC (frequent sets counting) is presented for view size estimation based on the frequent sets mining and can derive estimation of all views in a cube by two scans of database The results indicate that the proposed scheme approximates more accurately than other schemes, especially for high skewed dataset
出处
《计算机研究与发展》
EI
CSCD
北大核心
2004年第10期1670-1676,共7页
Journal of Computer Research and Development
基金
国家自然科学基金重点项目 ( 6993 3 0 10
60 3 0 3 0 0 8)
国家"八六三"高技术研究发展计划基金项目 ( 2 0 0 2AA4Z3 43 0
2 0 0 2AA2 3 10 41)
关键词
视图估算
频繁项集
均匀分布
数据倾斜度
view size estimation, frequent set, uniform distribution, data skewness