摘要
数据流挖掘中很多算法是基于定长滑动窗口的,定长滑动窗口的缺点是很难设置窗口的大小,而且对数据流分布的不同类型不存在最优大小的窗口,因此算法的性能较差。提出了可变滑动窗口算法,通过高效维护一个静态的最大范化均值完成。该常量在全部时间窗口中被最大化因而使用变长窗口。其他算法可以用该方法重新描述。实验表明了范化均值的有效性。
Data streams are data sources delivering indefinitely and at high rate new data, which causes computational and storage problems in the is that the data stream distribution analysis of these sources. An inherent rent data distribution. In case our o changes bjective assumption in the analysis of a data stream over time, which causes old data to be non-representative to the cur- is to obtain knowledge of the current data properties, analysis based on old and non-representative data will result in incorrect or incomplete knowledge of the current state. To cope with this problem, many data stream analyses are built on a fixed sliding window of the most “freshest” data points received. In this way, old data is forgotten and the space and time complexity of the analysis algorithm bounded. Due to the fact that no prior knowledge of the eventual changes is available, there exists no optimal window size and the quality of the analysis degrades. To increase the quality of analysis, it proposes some initial steps towards efficiently equipping sliding window algorithms with flexible windowing. Flexible windowing is obtained by the efficient maintenance of a statistic called, the maximum normalized mean. This statistic is used as a building block for algorithms in change detection and frequent item set mining. This paper shows that the maximum normalized mean can be maintained efficiently in time and space. Furthermore, it restates several algorithms in change detection and frequent item normalized set mining, such that the algorithms execution is only dependent on the maintenance of the maximum mean. And most importantly it investigates the gain in performance due to the flexible windowing over a fixed size window.
出处
《计算机科学与探索》
CSCD
2009年第5期519-538,共20页
Journal of Frontiers of Computer Science and Technology
关键词
泛化均值
滑动窗口
数据流
normalized mean
sliding window
data stream