摘要
K-Means算法是一种经典的聚类算法,有很多优点,也存在许多不足。比如初始聚类数K要事先指定,初始聚类中心选择存在随机性,算法容易生成局部最优解,受孤立点的影响很大等。文中主要针对K-Means算法初始聚类中心的选择以及孤立点问题加以改进,首先计算所有数据对象之间的距离,根据距离和的思想排除孤立点的影响,然后提出了一种新的初始聚类中心选择方法,并通过实验比较了改进算法与原算法的优劣。实验表明,改进算法受孤立点的影响明显降低,而且聚类结果更接近实际数据分布。
The algorithm of K-means is one kind of classical clustering algorithm,including both many points and also shortages.For example must choose the initial clustering number.The choose of initial clustering centre has randomness.The algorithm receives locally optimal solution easily,the effect of isolated point is serious.Mainly improved the choice of initial clustering centre and the problem of isolated point.First of all,the algorithm calculated distance between all data and eliminated the effect of isolated point.Then proposed one new method for choosing the initial clustering centre and compared the algorithm having improved and the original algorithm using the experiment.The experiments indicate that the effect of isolated point for algorithm having improved reduces obviously,the results of clustering approach the actual distribution of the data.
出处
《计算机技术与发展》
2011年第2期62-65,共4页
Computer Technology and Development
基金
安徽省教育科研重点项目(KJ2009A57)