摘要
针对原始K-means算法的一系列问题,提出一种基于半监督的K-means聚类改进算法,能够自动进行聚类,找出最优K值,并且最大限度地找出孤立点。首先根据样本集自身的特点,按照"类内尽可能相似"原则一步一步形成数据集,然后对数据集进行"去噪"与合并相似簇,最后,利用少量的标记信息指导和修正聚类结果。在UCI的多个数据集上测试,结果表明改进的算法较原始算法在准确率上有较大提高,并且具有更好的稳定性。
Original k-means algorithm for a range of issues,which is proposed on the basis of semi-supervised k-means Clustering Algorithm,can automatically cluster,finding the optimal k value,and the maximum outliers.First,according to the own characteristics of sample and the principle of category as similar as possible,data set is formed step by step,then "denoised" or merged into similar clusters,and finally,the resultant clustering is guided and corrected by using a small amount of tag information.Multiple data sets in the UCI test results show that the improved algorithm is of better accuracy and better stability than the original algorithm.
出处
《东莞理工学院学报》
2011年第1期29-32,共4页
Journal of Dongguan University of Technology