摘要
为提高k-modes算法的精度并解决初始簇中心选择问题,提出一种基于簇内簇间相异度的k-modes算法(IKMCA)。基于簇内簇间相似性对相异度系数进行改进,给出初始簇中心自主选择的具体方法。提出的簇内簇间相异度系数考虑特征值本身的相异性与其它相关特征对它们的区分性。提出的初始簇中心自主选择方法可以自动确定聚类个数和初始簇中心位置。实验结果表明,提出算法在聚类精度、纯度、召回率上均优于经典k-modes算法及其变体算法。
To increase the accuracy of k-modes algorithm and to solve the problem of the selection of the initial cluster centers,a k-modes clustering algorithm based on the dissimilarity of the intra-cluster and inter-cluster(IKMCA)was proposed.The dissimilarity was improved according to the similarity between the intra-cluster and inter-cluster and a specific method was provided for the self-determined selection of the initial cluster centers.This intra-cluster and inter-cluster dissimilarity not only took the dissimilarity of the characteristic values themselves into consideration,but also paid attention to their differentiation from other related characteristics.The self-determined selection of the initial cluster centers could automatically determine the number and the location of the initial cluster centers.Experimental results show that IKMCA algorithm is superior to the classic k-modes algorithm and its variants in clustering accuracy,purity and recall rate.
作者
贾子琪
宋玲
JIA Zi-qi;SONG Ling(School of Computer and Software,Nanyang Institute of Technology,Nanyang 473004,China;School of Computer,Electronics and Information,Guangxi University,Nanning 530004,China)
出处
《计算机工程与设计》
北大核心
2021年第9期2492-2500,共9页
Computer Engineering and Design
基金
国家自然科学基金项目(61762030)
广西创新驱动重大专项基金项目(桂科AA17204017)
广西重点研发计划基金项目(桂科AB19110050、桂科AB18126094)。
关键词
k模式算法
簇内簇间相似性
分类型数据
频率
相异度系数
k-modes algorithm
intra-cluster and inter-cluster similarity
categorical data
frequency
dissimilarity coefficient