摘要
为发现分布式数据流下不同形状的聚簇,提出了一种基于代表点的聚类算法。算法首先在代表点定义的基础上,提出环点的概念以及迭代查找密度相连环点的算法,在此基础上生成远程站点的局部模型;然后在协调站点设计合并局部模型,生成全局聚簇的算法。通过真实数据集与仿真数据集的实验表明,算法使用代表点能够发现不同形状的聚簇并显著降低数据传输量,同时通过测试—更新局部模型算法避免了频繁发送数据。
To find the clusters of different shapes under the distributed data streams environment, this paper proposed the representative-based clustering algorithm. First, it presented the concept of circular-point based on the representative points and designed the iterative algorithm to find the density-connected circular-points, then generated the local model at the remote site. Secondly it designed the algorithm to generate global clusters by combining the local models at coordinator site. The experimental results on real and synthetic datasets demonstrate that the algorithm can find the clusters in different shapes and reduce the data transmission by using representative points, while avoiding frequently sending data through the test-update strategy.
出处
《计算机应用研究》
CSCD
北大核心
2012年第8期2845-2848,共4页
Application Research of Computers
基金
国家自然科学基金资助项目(61073043)
黑龙江省自然科学基金资助项目(F201023)
关键词
分布式数据流
数据挖掘
聚类
聚类演化
代表点
distributed data stream
data mining
clustering
cluster evolving
representative point