摘要
数据流聚类算法是当前数据流研究领域里的重要分支,而滑动窗口是数据流中一种关注近期数据的近似方法,提出一种采用滑动窗口处理数据的优化算法SWStream.算法采用双层架构思想,在线阶段利用滑动窗口树存储概要结构,动态调整窗口大小.而在离线阶段对上一阶段的结果进行宏聚类,得到最后的结果.实验验证本算法有更高的处理效率,也相对节约内存.
Data stream clustering algorithm is important branch on current research in the field of data streams. Sliding window is one kind of approximation methods concerned about the recent data streams. This paper proposes an optimization algorithm SWStream which processes data over sliding window. In the online component, the sliding window tree is introduced to store the important statistical information of data streams, and adjusting the sizes of sliding windows. In the offline component, the mean values of the micro-clusters are macro-clustered, the final clustering results are abtained. The experiments verify that the algorithm has a higher processing efficiency, and saves memory.
出处
《河南科学》
2014年第5期777-780,共4页
Henan Science
基金
河南省科技厅研究计划项目(132300410395
122300410395)
关键词
数据流
滑动窗口
聚类
数据挖掘
data streams; sliding windows; clustering; data mining