摘要
滑动窗口是数据流中一种关注近期数据的近似方法,提出一种采用滑动窗口处理数据的优化算法SWStream。在线阶段利用滑动窗口树存储概要结构,动态调整窗口大小。优化后的算法能及时淘汰过期元组,同时对新到达的元组不断进行实时处理,可以获得更准确的分析结果。而在离线阶段对上一阶段的结果进行宏聚类,得到最后的结果。与聚类算法CluStream相比,此算法处理数据的效率更高,也相对节约内存。
Sliding window is one kind of approximation methods on recent data in data streams .This paper proposes an optimization algorithm SWStream which processes data over sliding window .In the online component , the sliding window tree is introduced to store the important statistical information of data streams , and adjust the sizes of sliding windows .Optimized algorithm can promptly eliminate expired tuple , and the new tuples arrive continuously in real-time processing , which can achieve more accurate results .In the offline component, by employing the mean value of the macro-clusters, generate the final clustering results .Com-pared with clustering algorithm CluStream , this algorithm is more efficient on data processing and memory sav-ing.
出处
《陕西理工学院学报(自然科学版)》
2014年第1期42-46,共5页
Journal of Shananxi University of Technology:Natural Science Edition
基金
河南省科技厅研究计划项目(132300410395)
河南省科技厅研究计划项目(122300410395)
关键词
数据流
滑动窗口
聚类
数据挖掘
data streams
sliding windows
clustering
data mining