摘要
针对传统K-均值聚类算法初始聚类中心和聚类数目确定困难的问题,提出了基于密度统计法和最大距离乘积法的聚类中心选取方法。该方法通过对样本空间网格化,选出局部包含样本最多的网格,并对这些局部最优网格内的样本点进行ε邻域密度统计,然后取邻域密度最大且相距最远的两个样本点为聚类中心进行一次聚类。计算每个样本点到各个聚类中心的距离的积,取距离积最大的样本点为下一个聚类中心,并以此循环聚类。仿真实验表明,该方法在聚类精度上具有明显优势。
For the problem of initial clustering centers and clustering numbers to be determined difficultly, a se- lection method of clustering center based on density statistics and maximal distance product is proposed. It lo- cates grids which cover the most samples in the local area by partitioning data space into grid and compute neighborhood density in the local-maximum grid, and then clusters by choosing two samples which possess maximum neighborhood density and the longest distance as the clustering centers. The product of the distances from each sample to every clustering center is calculated to choose the maximum one as the next clustering center, and repeated upwards to cluster. Simulation result shows that this method has obvious advantages on clustering precision.
出处
《测控技术》
CSCD
北大核心
2013年第10期152-154,共3页
Measurement & Control Technology