期刊文献+

通过最大化实现数据流算法中的可变滑动窗口(英文)

Flexible Windowing in Data Stream Algorithms by Maximization
下载PDF
导出
摘要 数据流挖掘中很多算法是基于定长滑动窗口的,定长滑动窗口的缺点是很难设置窗口的大小,而且对数据流分布的不同类型不存在最优大小的窗口,因此算法的性能较差。提出了可变滑动窗口算法,通过高效维护一个静态的最大范化均值完成。该常量在全部时间窗口中被最大化因而使用变长窗口。其他算法可以用该方法重新描述。实验表明了范化均值的有效性。 Data streams are data sources delivering indefinitely and at high rate new data, which causes computational and storage problems in the is that the data stream distribution analysis of these sources. An inherent rent data distribution. In case our o changes bjective assumption in the analysis of a data stream over time, which causes old data to be non-representative to the cur- is to obtain knowledge of the current data properties, analysis based on old and non-representative data will result in incorrect or incomplete knowledge of the current state. To cope with this problem, many data stream analyses are built on a fixed sliding window of the most “freshest” data points received. In this way, old data is forgotten and the space and time complexity of the analysis algorithm bounded. Due to the fact that no prior knowledge of the eventual changes is available, there exists no optimal window size and the quality of the analysis degrades. To increase the quality of analysis, it proposes some initial steps towards efficiently equipping sliding window algorithms with flexible windowing. Flexible windowing is obtained by the efficient maintenance of a statistic called, the maximum normalized mean. This statistic is used as a building block for algorithms in change detection and frequent item set mining. This paper shows that the maximum normalized mean can be maintained efficiently in time and space. Furthermore, it restates several algorithms in change detection and frequent item normalized set mining, such that the algorithms execution is only dependent on the maintenance of the maximum mean. And most importantly it investigates the gain in performance due to the flexible windowing over a fixed size window.
出处 《计算机科学与探索》 CSCD 2009年第5期519-538,共20页 Journal of Frontiers of Computer Science and Technology
关键词 泛化均值 滑动窗口 数据流 normalized mean sliding window data stream
  • 相关文献

参考文献10

  • 1Muthukrishnan S.Data streams:Algorithms and applications[].SODA’:Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms.2003
  • 2Aggarwal C.Data streams:Models and algorithms (advances in database systems)[]..2006
  • 3Domingos P,Hulten G.Learning from infinite data in finite time[].Advances in Neural Information Processing Systems.2002
  • 4Dasu T,Krishnan S,Venkatasubramanian S,et al.An information-theoretic approach to detecting changes in multi-dimensional data streams[].Interface.2006
  • 5Muthukrishnan S,Berg E,Wu Y.Sequential change detection on data streams[].theth International Conference on Data Mining-Workshops.2007
  • 6Calders T,Dexters N,Goethals B.Mining frequent itemsets in a stream[].Proceedings of the International Conference on Data Mining.2007
  • 7Datar M,Gionis A,Indyk P,et al.Maintaining stream statistics over sliding windows (extended abstract)[].SIAM Journal on Computing.2002
  • 8Indyk P.Stable distributions,pseudorandom generators,embeddings,and data stream computation[].Journal of the ACM.2006
  • 9Aggarwal C.A framework for diagnosing changes in evolving data streams[].SIGMOD’:Proceedings of theACM SIGMOD International Conference on Management of Data.2003
  • 10Ho S S.A martingale framework for concept change detection in time-varying data streams[].ICML’:Proceedings of thend International Conference on Machine Learning.2005

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部