摘要
数据流具有数据量无限且流速快等特点,使得传统的聚类算法不能直接应用于数据流聚类问题。针对该问题,提出了一种基于概率密度的数据流聚类算法。此方法不需要存储全部的历史数据,只需要存储新到达的数据并对其应用EM算法,利用高斯混合模型增量式地更新概率密度函数。实验表明,该算法对于解决数据流聚类问题非常有效。
Data stream is characterized by infinite data and quick stream speed, so traditional clustering algorithm cannot be applied to data stream clustering directly, In view of above questions, a probability-density-based data stream clustering algorithm was proposed. It requires only newly arrived data, not the entire historical data, to be saved in memory. It applies EM algorithm on the newly arrived data and updates probability-density function by incremental Gaussian mixture model. Experimental results show that the algorithm is very effective to solve data stream clustering.
出处
《计算机应用》
CSCD
北大核心
2007年第4期881-883,共3页
journal of Computer Applications
关键词
数据流
聚类
高斯混合模型
概率密度
data stream
clustering
Gaussian mixture model
probability-density