摘要
K-means算法是一种非常重要的聚类算法,然而算法的聚类效果受簇的个数、初始中心点位置的影响很大.提出基于优化初始中心集合和中心移动算法tNN-MEANS,算法有效解决了以下三个问题:1)准确确定大规模数据集中簇的个数;2)精确确定全局高密度的核心区域;3)克服了簇中存在多个高密度区域的问题.运用UCI数据集分别对X-means算法、DBSCAN算法和tNN-MEANS算法进行对比实验,实验结果验证了tNN-MEANS算法的聚类精度、确定簇的个数、蔟划分的正确率等性能均优于与之对比的其它算法.
K-means algorthm is a kind of important clustering algorithm, howere, the cluster- ing effect of x-means algorithm is greatly affected by the number of clusters and initial center position[1]. In this paper,we put forward the tNN-MEANS algorithm based on the algorithm of the optimization of the initial center and the center moving of clusters, the algorithm effec- tively solves the following three questions: 1) Accurately determine the number of clusters for massive data set; 2) Accurately determine the cluster core region for global high density data set; 3) effectively overcome the problems of the multiple high density area of the problems in a clustering cluster. We compare tttNN-MEANS algorithm with the alogrithm of P( - means and DBSCAN by experiment using UCI data set, the result of experiment prove tNN-MEANS algorithm proposed in the paper is superior to the others which are x-means and DBSCAN in clustering accuracy, determining the number of clusters and the accuracy of cluster partition.
出处
《数学的实践与认识》
CSCD
北大核心
2013年第13期174-181,共8页
Mathematics in Practice and Theory