摘要
提出一种基于衰减窗口的实时数据流聚类算法PDStream。算法首先对数据空间进行网格划分,采用改进的维度树结构维护和更新数据流的摘要信息,设计了一种周期性剪枝策略,周期性地剪去维度树中的稀疏网格,最后采用深度优先搜索算法在线处理聚类请求。基于人工数据集和真实数据集的实验表明,PDStream算法可以有效地发现数据流中任意形状的聚类,内存消耗少,具有较好的计算精度。
This paper proposed a novel real-time data stream clustering algorithm PDStream, which was based on damped win- dow. PDStream firstly divided data space into grids, then used an improved dimension tree structure to maintain and update the data stream summary statistics. Designed a pruning strategy to prune the sparse grids in dimension tree periodically. Final- ly used the depth first search (DSF) method to deal with online clustering request. The experimental results on synthetic data- set and real dataset demonstrate that PDStream has the advantages of discovering clusters of arbitrary shape effectively, low memory consumption, preferable precision.
出处
《计算机应用研究》
CSCD
北大核心
2009年第4期1331-1334,1341,共5页
Application Research of Computers
基金
国家自然科学基金资助项目(60674115)
关键词
数据流
网格聚类
衰减窗口
维度树
剪枝策略
data stream
grid clustering
damped window
dimension tree
pruning strategy