摘要
不确定数据聚类是数据挖掘领域中的一个重要的研究热点。本文介绍了不确定数据聚类的uk-means算法及其改进算法ck-means。由于ck-means算法必须计算每个簇到所有对象的质心的距离,因此当聚类的样本很大时,聚类效率依然不是很好。本文提出的kd-means算法只需要计算对象到部分质心的距离,因此可以很大程度地提高ck-means算法的效率。该方法是基于kd树索引而提出的改进策略,并用大量的实验来证明改进算法的有效性。
Clustering of uncertain data is an important research direction in the clustering research field. It has far-reaching applications in real life. An improved clustering algorithm kd-means is proposed by optimizing classical ck-means algorithm. The ck-means algorithm needs to calculate the distance of each cluster to the centroid of all objects,so when the sample is large,the clustering efficiency is not very good. The improved algorithm based on the kd-tree structure presented in the paper only needs to calcu- late part of the distances,which greatly improves the performance of the ck-means algorithm. Experiments demonstrate that the new algorithm is efficient.
出处
《广西师范大学学报(自然科学版)》
CAS
北大核心
2011年第2期161-166,共6页
Journal of Guangxi Normal University:Natural Science Edition
基金
国家自然科学基金资助项目(61063008)
云南省教育厅研究基金资助项目(09Y0048)
云南大学科学研究基金资助项目(2009F29Q)