摘要
为解决从多数据流挖掘演化事件这一难题,提出了一种多数据流上的谱聚类算法SCAM(spectral clustering algorithm of multi-streams),其相似矩阵基于耦合度构造,而耦合度衡量了两个数据流的动态相似性.提出了算法EEMA(evolutionary events mining algorithm),该算法基于聚类模型的演变挖掘多数据流的演化事件.定义了聚类模型凝聚度,用以衡量聚类的紧凑程度,并证明了凝聚度的上界.基于到上界的距离和规范化相似矩阵的特征间隙,定义了聚类模型质量,并作为EEMA的优化目标自动地确定聚簇数k.设计了O-EEMA作为EEMA的优化实现,其时间复杂度为O(cn2/2).在合成和真实数据集上的实验结果表明,EEMA和O-EEMA是有效的、可行的.
To solve the problem of mining evolutionary events from multi-streams, this paper proposes a spectral clustering algorithm, SCAM (spectral clustering algorithm of multi-streams), to generate the clustering models of Multi-Streams. The similarity matrix in the clustering models of Multi-Streams are based on Coupling Degree, which measures the dynamic similarity between two streams. In addition, this paper also proposes an algorithm, EEMA (evolutionary events mining algorithm), to discover the evolutionary event points based on the drift of clustering models. EEMA takes the index of Clustering Model Quality as the optimization objective in determing the number of clusters automatically. The Clustering Model Quality combines the matrix perturbation theory and the Clustering Cohesion, which has a sound upper bound and is used to measure the compactness of a clustering model. Finally, this paper presents O-EEMA (optimized-EEMA) as the optimization of EEMA with the temporal complexity of O(cn^2/2), and the results of extensive experiments on the synthetic and real data set show that EEMA and O-EEMA are effective and practicable.
出处
《软件学报》
EI
CSCD
北大核心
2010年第10期2395-2409,共15页
Journal of Software
基金
国家自然科学基金No.600773169
国家"十一.五"科技支撑计划No.2006BAI05A01~~
关键词
多数据流
耦合聚类
演化事件
矩阵扰动
multi-streams
spectral clustering
evolutionary event
matrix perturbation