摘要
在属性分布估计中,给定关系属性的类型为N个,使用B个(BN)数值近似其频度分布.基于直方图和小波的概要数据结构得到深入的研究,然而Chebyshev多项式也适合于近似算法.首先构造基于Chebyshev多项式的概要,再在其上估计原始属性分布,和以前的方法相比,算法的优势在于:L1、L2、L∞等误差度量下更高的精度;构造概要的时间复杂度只有O(NB);更易于动态维护.Chebyshev概要的有效性在模拟数据序列和实际数据序列上得到验证.
Suppose we are given N values of an attribute in a relation. B (B〈〈N) numerical values are used to approximate frequency distributions of attribute. To date, two major high-accuracy synopses construction techniques have been proposed in the database literature, namely histograms and Haar wavelets. Still, these approximation techniques have not been previously compared to Chebyshev polynomials. In this paper, we conduct such a comparison the accuracy achieved with synopses based on Chebyshev polynomials to that of other techniques, when estimating frequency distributions. We demonstrate that Chebyshev synopses have three main strong points in comparision to other methods : they aehieve higher accuracy as measured by the L1,L2 and L∞ error metrics; they can be built in only O(NB) time complexity; and Chebyshev synopses can be dynamically maintained. Experimental results on synthetic and real-life data clearly demonstrate the effectiveness of our method.
出处
《小型微型计算机系统》
CSCD
北大核心
2009年第1期31-36,共6页
Journal of Chinese Computer Systems