摘要
K-means算法是最常用的聚类算法之一,有很多的优点,但也存在着不足。它不仅对样本的输入顺序敏感,可能产生局部最优解,而且受孤立点的影响很大。文章正是针对这些不足,提出了一种改进的K-means算法,主要从数据预处理、初始聚类中心的选择方面进行了改进,并做了改进前后算法的对比实验。结果表明,改进后的算法不但更具稳定性,准确度也高,受孤立点的影响也大大降低。
K-means algorithm is one of the most widespread methods in clustering, including both strong points and also shortages. Not only is it sensitive to the order of sample data, but also it may make out the local excellent and be affected by the outliers. Given these shortages, an improved algorithm is discussed, which makes improvements in data preprocessing and selection of original clustering center. Check experiment was done, which indicates the improved one is more stable, more accurate and the affection by the outliers is down to a much low figure.
出处
《电脑与信息技术》
2008年第1期38-40,共3页
Computer and Information Technology