摘要
A novel data streams partitioning method is proposed to resolve problems of range-aggregation continuous queries over parallel streams for power industry.The first step of this method is to parallel sample the data,which is implemented as an extended reservoir-sampling algorithm.A skip factor based on the change ratio of data-values is introduced to describe the distribution characteristics of data-values adaptively.The second step of this method is to partition the fluxes of data streams averagely,which is implemented with two alternative equal-depth histogram generating algorithms that fit the different cases:one for incremental maintenance based on heuristics and the other for periodical updates to generate an approximate partition vector.The experimental results on actual data prove that the method is efficient,practical and suitable for time-varying data streams processing.
为了解决电力工业中并行数据流范围聚集的连续查询问题,提出一种新颖的数据流划分方法.首先构造了一个适用于数据流处理的扩展蓄水池抽样算法,根据流值变化率引入跳跃因子反应负荷数据的变化情况,实现数据流的自适应并行采样.然后为了实现数据流量的平均划分,基于近似技术提出2种适应不同情况的生成等深柱状图的算法:增量更新的启发式方法和周期性更新的快捷方法,从而在采样的基础上生成近似划分向量.通过在实际数据集上对算法性能测试,证明文中提出的数据流划分方法高效实用,适合于高速时变数据流的处理.
基金
The High Technology Research Plan of Jiangsu Prov-ince (No.BG2004034)
the Foundation of Graduate Creative Program ofJiangsu Province (No.xm04-36).