摘要
K-means算法采用欧氏距离进行数据点的划分,不能够准确地刻画数据集特征,而随机选取聚类中心点的机制,也不能获得好的聚类结果。为此,提出一种基于数据场的数据势能竞争与K-means算法融合的聚类算法。算法中定义了数据场的概念,利用局部最小距离进行数据聚合势能的竞争,然后利用势能熵提取基于数据集分布的最优截断距离,根据截断距离与斜率确定出簇中心点,实现K-means聚类。在UCI数据集上的测试结果表明,融合后的算法具有更好的聚类结果。
The K-means algorithm uses the Euclidean distance to divide the data points, cannot accurately characterize the data set, and randomly select the clustering center point mechanism, and cannot get good clustering results. In this paper, a clustering algorithm based on data field-based data potential competition and K-means algorithm is proposed. In this algorithm, the concept of data field is defined, and the local minimum distance is used to compete the potential of data aggregation. The optimal truncation distance based on the distribution of data set is extracted by using potential energy entropy. The cluster center point is determined according to the truncation distance and slope, and the K-means clustering is realized. The results of the UCI dataset show that the fusion algorithm has better clustering results.
出处
《计算机应用与软件》
2017年第12期266-272,共7页
Computer Applications and Software
基金
江苏省自然科学基金项目(BK20140165)
关键词
数据竞争
数据场
势能熵
斜率
复杂数据集
Data competition
Data field
Potential entropy
Slope
Complex dataset