摘要
为了提高进化数据流的聚类质量,提出基于半监督近邻传播的数据流聚类算法(SAPStream),该算法借鉴半监督聚类的思想对初始数据流构造相似度矩阵进行近邻传播聚类,建立在线聚类模型,随着数据流的进化,应用衰减窗口技术对聚类模型适时做出调整,对产生的类代表点和新到来的数据点再次聚类得到数据流的聚类结果。对数据流进行动态聚类的实验结果表明该算法是高质有效的。
In order to improve the clustering quality of evolving data stream, this paper introduces a new data stream clustering algorithm, clustering over data Stream based on Semi-supervised Affinity Propagation (SAPStream), this algorithm calculates the similarity matrix of the initial data with the idea of semi-supervised, executes AP cluster, and then builds online clustering model. With the evolution of the data stream, the clustering model adjusts using decay windows technology, and the data stream clustering results are got by executing cluster again over the exemplars and new arrival data points. SAPStream can analyze and deal with large-scale evolving data stream. Its performance is tested by using both real datasets and synthetic datasets. Experi- mental results show this algorithm achieves a higher quality of clustering.
出处
《计算机工程与应用》
CSCD
2013年第8期6-8,47,共4页
Computer Engineering and Applications
基金
国家自然科学基金资助重点项目(No.90912004)
关键词
数据流
半监督
近邻传播聚类
衰减窗口
data stream
semi-supervised
affinity propagation clustering
decay windows