摘要
针对不确定数据流聚类问题,提出一种基于引力相似度和相对密度的聚类算法.采用在线/离线两阶段处理框架,综合考虑元组之间的相似度与元组自身的不确定性,利用引力相似度为每个不断到达的数据元组寻找可能归属的微簇,以新的离群点处理和在线维护机制来适应数据流的演化情况,并在离线层使用相对密度算法进行聚类,不需要预先指定聚类数且可处理任意形状的微簇.实验结果表明,与现有的聚类方法相比,所提出的算法具有更高的聚类质量和准确度.
For the issue of uncertain data stream clustering,an effective clustering algorithm based on gravity similarity and relative density technique was proposed in this paper.The algorithm adopted an online/offline two-stage processing framework and considered simularity and data uncertainty together to measure the clustering quality.For each incoming tuples,it used gravity similarity to find the possible micro-cluster.Besides,a novel outlier processing and online maintenance mechanism were developed to adapt to the evolution of the data stream.At the offline stage,it used a relative density clustering algorithm to handle arbitrary shape micro clusters.The experimental results show that the proposed algorithm outperforms existing methods in quality and accuracy.
出处
《上海交通大学学报》
EI
CAS
CSCD
北大核心
2016年第6期873-878,共6页
Journal of Shanghai Jiaotong University
基金
水利部公益性行业科研专项项目(201401044)资助
关键词
不确定数据流
聚类
引力
相似度
相对密度
离群点
uncertain data stream
clustering
gravily
similarity
related density
outlier