摘要
传统K-Means对算法使用者有较高的要求,需要明确K值,并确定初始中心点的位置。通过定义、检测并删除离群点,运用Canopy算法辅助确认K值范围和粗略中心点,借助Silhouette评价指标选择最优K值及其对应的聚类结果的方法,对传统K-Means算法进行改进,改进后的算法不需要手工输入K值和初始中心点。验证结果表明:改进的K-Means算法在聚类时,结果稳定准确,且当数据点数量较大时在迭代次数方面略优于传统算法。
The traditional K-Means has a high requirement for the user of the algorithm,need determine the K value and the location of the initial center point, define, detect and delete the outliers, canopy algorithm assists in identifying the range of K values and rough center points; select the optimal K value for Silhouette evaluation index and its corresponding clustering results, improve the traditional algorithm of K-means, the improved algorithm does not need to manually enter the K value and the initial center point.The results show that the improved k-means algorithm is stable and accurate when it is clustered; and when the number of data points is large, the number of iterations is slightly better than the traditional algorithm.
作者
徐立
XU Li(Shangqiu Polytechnic, School of Software, Henan Shangqiu 476100,Chin)
出处
《河北软件职业技术学院学报》
2018年第2期18-20,共3页
Journal of Hebei Software Institute
基金
河南省社科联
河南省经团联调研课题(SKL-2016-2062)