期刊文献+

基于密度与近邻传播的数据流聚类算法 被引量:27

Data Stream Clustering Algorithm Based on Density and Affinity Propagation Techniques
下载PDF
导出
摘要 针对现有算法聚类精度不高、处理离群点能力较差以及不能实时检测数据流变化的缺陷,提出一种基于密度与近邻传播融合的数据流聚类算法.该算法采用在线/离线两阶段处理框架,通过引入微簇衰减密度来精确反映数据流的演化信息,并采用在线动态维护和删减微簇机制,使算法模型更符合原始数据流的内在特性.同时,当模型中检测到新的类模式出现时,采用一种改进的加权近邻传播聚类(Weighted and hierarchical affinity propagation,WAP)算法对模型进行重建,因而能够实时检测到数据流的变化,并能给出任意时间的聚类结果.在真实数据集和人工数据集上的实验表明,该算法具有良好的适用性、有效性和可扩展性,能够取得较好的聚类效果. For the accuracy of the existing clustering algorithm is not high, and the ability of dealing with outliers is poor and unable to detect the real-time changes of data stream, a data stream clustering algorithm based on density and affinity propagation is proposed. The algorithm adopts an online/offiine two-stage processing framework and it introduces the micro-cluster decay density to reflect the evolution of the data stream accurately. In the meantime, it uses the mechanism of online dynamic maintenance and deletion of the micro-cluster, which makes the algorithm's model more consistent with the intrinsic characteristics of the original data streams. Simultaneously~ it also takes an improved WAP (weighted and hierarchical affinity propagation) algorithm to reconstruct the models when detecting a new emerging class model. Thus it can detect the changes of the data stream in real time, and give the clustering results at any time. Experiments on real data sets and artificial data sets show that the algorithm has good applicability, efficiency, and scalability, thus it can achieve better clustering results.
出处 《自动化学报》 EI CSCD 北大核心 2014年第2期277-288,共12页 Acta Automatica Sinica
基金 国家高技术研究发展计划(863计划)(2011AA010603 2011AA010605)资助~~
关键词 数据流挖掘 近邻传播 基于密度聚类 变化检测 Data stream mining, affinity propagation, density-based clustering, change detection method
  • 相关文献

参考文献4

二级参考文献46

  • 1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 2陈卓,孟庆春,魏振钢,任丽婕,窦金凤.一种基于网格和密度凝聚点的快速聚类算法[J].哈尔滨工业大学学报,2005,37(12):1654-1657. 被引量:14
  • 3朱蔚恒,印鉴,谢益煌.基于数据流的任意形状聚类算法[J].软件学报,2006,17(3):379-387. 被引量:50
  • 4杨春宇,周杰.一种混合属性数据流聚类算法[J].计算机学报,2007,30(8):1364-1371. 被引量:22
  • 5Witten I H, Frank E. Data mining: Practical machine learning tools and techniques[M]. 2nd ed. Beijing:China Machine Press, 2006.
  • 6Garofalakis M, Gehrke J, Rastogi R. Querying and mining data streams: You only get one look[C]. ACMSIGMOD Int Conf on Management of Data. Madison: Acm Press, 2002:635.
  • 7Barbard D, Requirements for clustering data streams[J]. SIGKDD Explorations, 2003, 3(2): 23-27.
  • 8Zhang T, Ramakrishnan R. Birch: An efficient data clustering method for very large databases[C]. Proc of ACM SIGMOD Conference on Management of Data. Madison:ACM Press, 1996: 103-114.
  • 9Aggarwal C, Han J, Wang J, et al. A framework for clustering evolving data streams[C]. Proc of Int Conf on Very Large Data Bases. Berlin, 2003: 81-92.
  • 10Aggarwal C, Han J, Wang J, et al. A framework for projected clustering of high dimensional data streams [C]. Proc of the 30th VLDB Conf. Toronto, 2004: 852-863.

共引文献192

同被引文献234

引证文献27

二级引证文献149

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部