摘要
文章针对聚类分析中部分数据缺失问题,提出了一种改进的K均值聚类算法,即改变原算法中计算每个数据到各簇距离的度量方法和新中心点生成方法,从而屏蔽空值数据的影响;通过选择UCI中Iris数据集,随机抽空部分数据进行测试表明,该算法可直接对存在数据空缺的数据集合进行聚类分析,并能有效屏蔽数据空缺对聚类结果的影响。
In this paper, an improved K-means clustering algorithm is presented to solve the data-missing problem in clustering analysis. The improved algorithm can reduce the disturbance of missing data through changing the method of measuring distance and generating new centers. In the experiment, original Iris data from UCI are used and some of them removed randomly. The result shows that this algorithm can analyze data sets with missing data directly and reduce the disturbance of missing data to the result of clustering effectively.
出处
《合肥工业大学学报(自然科学版)》
CAS
CSCD
北大核心
2008年第9期1455-1457,共3页
Journal of Hefei University of Technology:Natural Science