摘要
数据流大纲的维护对于DSMS非常重要:流数据的实时性、持续性和有序性(即,老化特性)使得查询引擎需要根据实时的概要信息自适应地调整执行计划,保持其执行效率。本文提出一种新的数据流大纲结构—ETNs,它通过指数划分方法将数据流在时间域上划分为指数区间,每个区间用具有较小空间复杂度和时间复杂度的Tiny直方图来记录区间的概要信息,使得ETHs既能够反映数据流上某些数据的衰减,又能够实现n-of-N模型下的共享计算,在εN误差范围内持续地维护最近N个元素的概要信息,具有较小的时间代价和空间代价。实验证明,ETHs是数据流上的一种较理想的大纲结构。
Maintaining data stream synopsis is very important in DSMS. Data stream tuple is real-time, continuous and ordered (namely, aged). Query engine needs to adjust query plan by on-line synopsis to guarantee its processing efficiency. In this paper, we propose a new synopsis structure called ETHs, which partitions time dimension into exponential intervals by EH partitioning technique. In each subinterval, we use tiny histogram which has small space and time complexity to record summary information. So, it can reflect the stateness of certain data elements and share com- putations under mof-N model. With a guaranteed precision of εN,it continuously maintaining the summary information of the most recent N dements over data stream with little time and space overhead. Performance study shows that ETHs is a good data stream synosis maintaining algorithm.
出处
《计算机科学》
CSCD
北大核心
2005年第11期81-84,共4页
Computer Science