摘要
对基于密度的分布式聚类算法DBDC(density based distributed clustering)进行改进,提出了一种基于密度的分布式聚类算法DBDC*.该算法在局部筛选代表点时结合贝叶斯信息准则BIC,得到少量精准反映局部站点数据分布的BIC核心点,有效降低了分布式聚类过程中的数据通信量,全局聚类时综合考虑了各站点数据的分布情况.实验结果表明,算法DBDC*的效率优于DBDC,聚类效果好.
A large number of data are distributed with the application of networks. Distributed clustering is a challenging research topic due to variety of the real-life constrains including bandwidth, the storage of the site memory, etc. An effective density-based distributed clustering algorithm (DBDC * ) is proposed to improve efficiency of the distributed clustering algorithm (DBDC). DBDC * , which is combined with the Bayesian Information Criterion, only selecting less BIC_ core_ points to represent each local site, effectively decrease network overload and improves the quality of global clustering. DBDC * is carried out on two different levels, i.e. the local level and the global level. On the local level, all sites carry out a DBSCAN clustering independently from each other. After having completed the clustering, a BIC core points local model is de/ermined. Next the local model is transferred to a central site, where the local models are merged in order to form a global model on the global level by analyzing the local BIC core points. To each local representatives a global cluster-identifier is assigned. This resulting global clustering is broadcasted to all local sites. Then all local models are updated. Experimental results show that the efficiency of the algorithm DBDC * is superior to that of the algorithm DBDC.
出处
《南京大学学报(自然科学版)》
CAS
CSCD
北大核心
2008年第5期536-543,共8页
Journal of Nanjing University(Natural Science)
基金
国家自然科学基金(40771163)
关键词
聚类
分布式聚类
基于密度的聚类算法(DBSCAN)
分布式聚类算法(DBDC)
clustering, distributed clustering, density-based spatical cIustiny of application with noise(DBSCAN), density based distributed clusting(DBDC)