期刊文献+

带权值数据流滑动窗口随机抽样算法的改进 被引量:3

Improved random sampling algorithms for sliding windows over weighted streaming data
下载PDF
导出
摘要 通过改进加权抽样算法,结合基本窗口技术,提出了两种面向带权值数据流上连续更新滑动窗口的随机抽样算法:WRSB算法和IWRSB算法。当新的数据元组到达时,根据数据元组的权值计算出该元组的键值,根据元组键值的大小决定其是否进入样本集以及样本集中被替换的数据元组,同时设置一个系统缓冲区来保存最近到达的键值较大的部分数据元组,作为过期数据元组的后备,使算法能够有效地处理过期数据元组问题。理论分析和实验结果表明,两种算法都能有效地处理带权值数据流上连续更新滑动窗口的随机抽样问题,相比较而言,IWRSB算法具有更好的性能。 The main focus in algorithms for data streams has been on efficient construction of summary structures(synopses) for sliding windows.This paper introduces the problem of construction of summary structures(synopses) from continuous updated sliding windows over weighted streaming data and presents two improved random sampling algorithms for this problem.The presented al gorithms are basic window-based random sampling algorithms,which extends the Weighted Random Sampling(WRS) algorithm to deal with the expiration of data items from sliding windows.For every tuple,a key is calculated by its weight and a randomlygenerated value.When a tuple is added to the sample or selected to delete from the sample,not only the key but also the arrival time is considered.The experimental results show that the algorithms are effective and efficient for continuous weighted streaming data processing over sliding window model.
出处 《计算机工程与应用》 CSCD 北大核心 2007年第25期18-20,共3页 Computer Engineering and Applications
基金 国家自然科学基金(the National Natural Science Foundation of China under Grant No.60573096) 山东理工大学科研基金(the ScienceFoundation of Shandong University of Technology under Grant No.2006KJM15) 。
关键词 数据流 滑动窗口 概要数据结构 随机抽样算法 data stream sliding window summary structure random sampling algorithm
  • 相关文献

参考文献8

  • 1Babcock B,Babu S,Datar M,et al.Models and issues in data stream systems[C]//Proceedings of 21st ACM SIGACT-SIGMODSIGART Symp on Principles of Database Systems,Madison,Wisconsin,May 2002:1-16.
  • 2金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 3王伟平,李建中,张冬冬,郭龙江.数据流上周期更新滑动窗口的连接算法[J].哈尔滨工业大学学报,2005,37(6):756-759. 被引量:6
  • 4Vitter J S.Random sampling with a reservoir[J].ACM Trans on Mathematical Software,1985,11(1):37-57.
  • 5Johnson T,Muthukrishnan S,Rozenbaum I.Sampling algorithms in a stream operator[C]//Proceedings of the 2005 ACM SIGMOD international conference on Management of data.Baltimore,Maryland.2005:1-12.
  • 6Babcock B,Datar M,Motwani R.Sampling from a moving window over streaming data[C]//Proeeedings of the 13th Annual ACM-SIAM Symp.on Discrete Algorithms.San Francisco,ACM/SIAM,2002:633-634.
  • 7Efraimidis P S,Spirakis P G.Weighted random sampling with a reservoir[J].Information Processing Letters,2006,97:181-185.
  • 8Zhang Longbo,Li Zhanhuai,Yu Min,et al.Construction of summary structures from sliding windows over data streams[J].Journal of Computational Information Systems,3(3).

二级参考文献57

  • 1Babcock B, Babu S, Datar M, Motwani R, Widom J. Models and issues in data streams. In: Popa L, ed. Proc. of the 21st ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems. Madison: ACM Press, 2002. 1~16.
  • 2Terry D, Goldberg D, Nichols D, Oki B. Continuous queries over append-only databases. SIGMOD Record, 1992,21(2):321-330.
  • 3Avnur R, Hellerstein J. Eddies: Continuously adaptive query processing. In: Chen W, Naughton JF, Bernstein PA, eds. Proc. of the 2000 ACM SIGMOD Int'l Conf. on Management of Data. Dallas: ACM Press, 2000. 261~272.
  • 4Hellerstein J, Franklin M, Chandrasekaran S, Deshpande A, Hildrum K, Madden S, Raman V, Shah MA. Adaptive query processing: Technology in evolution. IEEE Data Engineering Bulletin, 2000,23(2):7-18.
  • 5Carney D, Cetinternel U, Cherniack M, Convey C, Lee S, Seidman G, Stonebraker M, Tatbul N, Zdonik S. Monitoring streams?A new class of DBMS applications. Technical Report, CS-02-01, Providence: Department of Computer Science, Brown University, 2002.
  • 6Guha S, Mishra N, Motwani R, O'Callaghan L. Clustering data streams. In: Blum A, ed. The 41st Annual Symp. on Foundations of Computer Science, FOCS 2000. Redondo Beach: IEEE Computer Society, 2000. 359-366.
  • 7Domingos P, Hulten G. Mining high-speed data streams. In: Ramakrishnan R, Stolfo S, Pregibon D, eds. Proc. of the 6th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. Boston: ACM Press, 2000. 71-80.
  • 8Domingos P, Hulten G, Spencer L. Mining time-changing data streams. In: Provost F, Srikant R, eds. Proc. of the 7th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. San Francisco: ACM Press, 2001. 97~106.
  • 9Zhou A, Cai Z, Wei L, Qian W. M-Kernel merging: Towards density estimation over data streams. In: Cha SK, Yoshikawa M, eds. The 8th Int'l Conf. on Database Systems for Advanced Applications (DASFAA 2003). Kyoto: IEEE Computer Society, 2003. 285~292.
  • 10Gibbons PB, Matias Y. Synopsis data structures for massive data sets. In: Tarjan RE, Warnow T, eds. Proc. of the 10th Annual ACM-SIAM Symp. on Discrete Algorithms. Baltimore: ACM/SIAM, 1999. 909-910.

共引文献164

同被引文献20

引证文献3

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部