摘要
为提高数据流聚类的精度和时效性,提出一种具有时态特征与近邻传播思想的高效数据流聚类算法(TCAPStream).该算法利用改进的WAP将新检测到的类模式合并到聚类模型中,同时利用微簇时态密度表征数据流的时态演化特征,并提出在线动态删除机制对微簇进行维护,使算法模型既能体现数据流的时态特征,又能反映数据流的分布特性,得到更精确的聚类结果.实验结果表明,该算法在多个人工数据集和真实数据集上不仅具有良好的聚类效果,而且具有较好的伸缩性和可扩展性.
To improve the accuracy and timeliness of data stream clustering, an efficient data stream clustering algorithm is proposed with temporal characteristics and affinity propagation methods ( TCAPStream) . The algorithm merges the newly detected class mode into clustering model by using the improved WAP algorithm, meanwhile, the temporal evolution characteristic of the data stream is reflected by using micro-cluster temporal density. Besides, the online dynamic deletion mechanism is proposed to maintain the micro-clusters. It makes the algorithm model reflecting both temporal and distribution characteristics of data stream to obtain more accurate clustering results. The experimental results show that the proposed algorithm not only has good clustering effect in several artificial datasets and real datasets, but also has good flexibility and extensibility.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2014年第5期443-451,共9页
Pattern Recognition and Artificial Intelligence
基金
国家高技术研究发展863计划项目(No.2011AA010603
2011AA010605)资助
关键词
数据挖掘
近邻传播聚类
时态密度
模型重建
数据流
Data Mining
Affinity Propagation Clustering
Temporal Density
Model Reconstruction
Data Stream