摘要
通过改进加权抽样算法,结合基本窗口技术,提出了两种面向带权值数据流上连续更新滑动窗口的随机抽样算法:WRSB算法和IWRSB算法。当新的数据元组到达时,根据数据元组的权值计算出该元组的键值,根据元组键值的大小决定其是否进入样本集以及样本集中被替换的数据元组,同时设置一个系统缓冲区来保存最近到达的键值较大的部分数据元组,作为过期数据元组的后备,使算法能够有效地处理过期数据元组问题。理论分析和实验结果表明,两种算法都能有效地处理带权值数据流上连续更新滑动窗口的随机抽样问题,相比较而言,IWRSB算法具有更好的性能。
The main focus in algorithms for data streams has been on efficient construction of summary structures(synopses) for sliding windows.This paper introduces the problem of construction of summary structures(synopses) from continuous updated sliding windows over weighted streaming data and presents two improved random sampling algorithms for this problem.The presented al gorithms are basic window-based random sampling algorithms,which extends the Weighted Random Sampling(WRS) algorithm to deal with the expiration of data items from sliding windows.For every tuple,a key is calculated by its weight and a randomlygenerated value.When a tuple is added to the sample or selected to delete from the sample,not only the key but also the arrival time is considered.The experimental results show that the algorithms are effective and efficient for continuous weighted streaming data processing over sliding window model.
出处
《计算机工程与应用》
CSCD
北大核心
2007年第25期18-20,共3页
Computer Engineering and Applications
基金
国家自然科学基金(the National Natural Science Foundation of China under Grant No.60573096)
山东理工大学科研基金(the ScienceFoundation of Shandong University of Technology under Grant No.2006KJM15) 。
关键词
数据流
滑动窗口
概要数据结构
随机抽样算法
data stream
sliding window
summary structure
random sampling algorithm