期刊文献+

基于近似等深柱状图的数据流并行聚集算法 被引量:1

Parallel aggregation algorithm over data streams based on approximate equal depth histogram
下载PDF
导出
摘要 针对数据流并行聚集问题,提出了一种不同于关系数据和时间序列数据处理的并行聚集方法。为解决已经划分出的数据流元组无法再现的特点,提出能够感知数据流变化的采样算法对数据流采样。利用近似等深柱状图技术描述采样数据的分布特征,平均分配数据流量。使用时间聚集森林结构计算时间窗聚集。通过验证采样个数对并行聚集的影响,数据分布对近似划分向量算法性能的影响,测试数据流量与并行聚集加速比的关系,证明本算法能够高效地计算数据流聚集查询。 In light of parallel aggregating algorithm for data stream, a new parallel aggregating strategy with relational data and temporal sequence data was put forward.To tackle the problem that partitioned tuples were unable to recur over data streams,a change-aware sampling algorithm was used to sample data streams first,and an approximate equal depth histogram algorithm was used to describe data distribution in query window to partition data streams averagely,and then a novel m-order Temporal Aggregate forest data structure was applied to computing aggregate in time window.It verified the effect of numbers of sampling on parallel aggregation,the effect of data distribution on the performance of approximate partition vector algorithm,and the relation between quantity of data streams and parallel aggregation speed ratio.Experiments prove that the proposed algorithm can provide accurate answer to parallel aggregation over data streams efficiently.
作者 侯燕 王永利
出处 《解放军理工大学学报(自然科学版)》 EI 2008年第1期29-33,共5页 Journal of PLA University of Science and Technology(Natural Science Edition)
关键词 数据流 并行处理 近似技术 柱状图 聚集 data streams parallel processing approximate technique histogram aggregation
  • 相关文献

参考文献9

  • 1SHAH M,HELLERSTEIN J,cHANDRAsEKARAN S,et al.Flux:an adaptive partitioning operator for continuous query system[R].Berkely:Report No.UCB/CSD-2-1205,University of California Berkeley,2002.
  • 2MOON B,FERNANDO I,LOPEZ V.Efficient algorithms for large-scale temporal aggregation[J].IEEE Transactions on Knowledge & Data Engineering,2003,15(3):44-759.
  • 3ARASU A,MANKU G.Approximate counts and quantiles over sliding windows[C].Proe of the 23rd ACM SIGACT-SIGMOD-SIGART Symp on Principles of Database Systems,Paris:ACM Press,2004.
  • 4DATAR M,GIONIS A,INDYK P.et al.Maintaining stream statistics over sliding windows[C].Proc of the 13th Annual ACM-SIAM Syrup.on Discrete Algorithms,San Francisco:ACM Press,2002.
  • 5VITTER J S.Random sampling with a reservoir[J].ACM Transactions on Mathematical Software,1985,11(1):37-57.
  • 6WANG Yongli,XU Hongbin,DONG Yisheng,et al.Data partitioning over data streams based on changeaware sampling[C].Beijing:IEEE Press,2005.
  • 7GRAY J,ADAM B,ANDREW L,et al.Data cube:a relational aggregation operator generalizing group-by,cross-tab,and sub-total[C].New Orleans;IEEE Computer Society,1996.
  • 8KNUTH D E.The Art of computer programming[M].Mass:Addison-Wesley,1973.
  • 9SRIVASTAVA J,LUM V Y.A tree based access method(TBSAM)for fast processing of aggregate queries[C].Los Angeles:IEEE Computer Society,1998.

同被引文献11

引证文献1

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部