摘要
为了提高数据流的聚类质量与效率,提出了一种基于密度的数据流聚类算法,该算法采用双层聚类框架,对于历史数据的遗忘问题采用了消逝策略和粒度调整策略,消逝策略能够处理噪声,节约内存;粒度调整策略检测当前的内存消耗,提高了聚类质量。基于标准数据集和仿真数据集的实验表明,此算法是可行有效的,适合处理和分析大规模的快速数据流。
Data stream clustering algorithm was improved in terms of cluster quality and efficiency. This paper presented a new data stream clustering algorithm based on density. The algorithm uses the double-layer clustering framework. It applied the fading and the size adjustment methods to solve the issue of forgotten of historical data. Fading can deal with noise, and reduced memory; size adjustment methods can detect the current memory consumption, and improve the clustering quality. The experiments based on the standard data sets and simulation data sets show that this algorithm is feasible and effective and it suit for processing and analysis of large-scale fast data stream.
出处
《南阳理工学院学报》
2012年第2期72-75,共4页
Journal of Nanyang Institute of Technology
关键词
数据流
聚类
密度
data stream
clustering
density