摘要
针对现有算法聚类精度不高、处理离群点能力较差以及不能实时检测数据流变化的缺陷,提出一种基于密度与近邻传播融合的数据流聚类算法.该算法采用在线/离线两阶段处理框架,通过引入微簇衰减密度来精确反映数据流的演化信息,并采用在线动态维护和删减微簇机制,使算法模型更符合原始数据流的内在特性.同时,当模型中检测到新的类模式出现时,采用一种改进的加权近邻传播聚类(Weighted and hierarchical affinity propagation,WAP)算法对模型进行重建,因而能够实时检测到数据流的变化,并能给出任意时间的聚类结果.在真实数据集和人工数据集上的实验表明,该算法具有良好的适用性、有效性和可扩展性,能够取得较好的聚类效果.
For the accuracy of the existing clustering algorithm is not high, and the ability of dealing with outliers is poor and unable to detect the real-time changes of data stream, a data stream clustering algorithm based on density and affinity propagation is proposed. The algorithm adopts an online/offiine two-stage processing framework and it introduces the micro-cluster decay density to reflect the evolution of the data stream accurately. In the meantime, it uses the mechanism of online dynamic maintenance and deletion of the micro-cluster, which makes the algorithm's model more consistent with the intrinsic characteristics of the original data streams. Simultaneously~ it also takes an improved WAP (weighted and hierarchical affinity propagation) algorithm to reconstruct the models when detecting a new emerging class model. Thus it can detect the changes of the data stream in real time, and give the clustering results at any time. Experiments on real data sets and artificial data sets show that the algorithm has good applicability, efficiency, and scalability, thus it can achieve better clustering results.
出处
《自动化学报》
EI
CSCD
北大核心
2014年第2期277-288,共12页
Acta Automatica Sinica
基金
国家高技术研究发展计划(863计划)(2011AA010603
2011AA010605)资助~~
关键词
数据流挖掘
近邻传播
基于密度聚类
变化检测
Data stream mining, affinity propagation, density-based clustering, change detection method