摘要
聚类分析是一种重要的数据挖掘方法,K-means算法是其中最常用的基于划分的方法。本文提出了一种基于初始均值点离散化的改进K-means算法。改进的算法在选取初始均值点时,尽量使初始均值点的分布离散化,解决了传统算法中随机选取初始均值点所造成的一些问题。同时,为了得到更高质量的聚类结果,本文进行了数据集中的离群点检测和自动确定参数k的最佳取值两方面的前期处理工作。实验证明,改进后的算法明显优于传统算法。
The clustering analysis is an important method of data mining,where K-means algorith is a com-monly used method base on classification. An improved K-means algorithm based on discretization of initial average point is proposed. When selecting the initial average point,the distribution of the initial average point should be discrete to overcome the problems in traditional algorithm. In order to improve the quality of clustering results,the detection of the outliers and the auto commit of the parameter k are carried out. The experi-ment shows that the improved algorithm is better than traditional algorithm.
出处
《辽宁科技大学学报》
CAS
2014年第5期455-459,共5页
Journal of University of Science and Technology Liaoning
基金
国家自然科学基金(71472081)