摘要
经典竞争凝聚(CA)算法具有自动寻找聚类总数的特性,避免了预判参数对聚类结果的影响,但在聚类过程中,该算法并未利用样本数据中普遍存在的少量已知信息,而这些已知信息往往能够对整个聚类过程提供有益的帮助;此外算法在相似度度量函数上采用了最为常见的欧氏距离,该距离仅适用于球状的聚类,且存在等划分的趋势,这就制约了算法的应用范围.针对上述问题,通过引入具有半监督学习能力的半监督项,增强隶属度矩阵的划分能力,并利用样本数据的点密度信息,生成距离调节因子修正欧氏距离,最终得到了基于点密度的半监督CA算法.在人造模拟图像和真实图像上的聚类分割结果,以及与其它算法的性能比较,表明了所得算法,能得到较为准确的中心值,有更佳的聚类效果.
The competitive agglomeration(CA)is a very classic algorithm in clustering algorithm.The algorithm has the ability to get cluster number automatically.It judges and gives up the false clustering centers during iterative process of continuous until the last number of cluster is most appropriate for sample date.Through this way it avoids the influence on the clustering results by anticipating parameters incorrectly,and does not need to set precise clustering number for sample date.But during its clustering,it fails to take into account the known information,which is little but prevalent in the sample data.However those known informations are important for the clustering results.Obviously,making proper use of the information is conducive to improve the clustering rate.Moreover,the algorithm uses the Euclidean distance as the similarity function.Even though the distance formula has the advantages in calculation and is wildly used in common algorithms,the distance is only applicable to spherical clustering and it has the trend of equal partition for data sets.There are many different kinds of sample data may need cluster.And considering the diversity of sample data,a conclusion would be gotten,that all these above have restricted the application scope of the algorithm.To solve these problems,the semi-supervised entry was introduced to enhance partitioning capabilityof membership matrix.It has the ability of learning which could help the algorithm make full use of the information that known in sample data.And a distance correction with the information of dot density was built.The dot density could reflect the importance of one point in data clustering and could be built for adjusting the Euclidean distance,in order to avoiding the distance leading a trend of equal partition for clustering result.Finally a semi-supervised algorithm based on density was proposed.Four images were divided into two groups,which were artificial image and real images.And they were designed for examining the segmentation.Three other algorithms were used for comparison with the algorithm proposed.Through the clustering segmentation results of images and the comparison with other algorithm in performance,the results show that the proposed algorithm can get more accurate center value and get better clustering results.
出处
《南京大学学报(自然科学版)》
CAS
CSCD
北大核心
2014年第4期447-456,共10页
Journal of Nanjing University(Natural Science)
基金
国家自然科学基金(61170122)
江苏省自然科学基金(BK2012552)
关键词
竞争凝聚(CA)算法
欧氏距离
半监督
点密度
距离调节因子
competitive agglomeration(CA)algorithm
Euclidean distance
semi-supervised
dot density
distance correction factor