摘要
传统的K-modes算法采用简单的属性匹配方式计算同一属性下不同属性值的距离,并且计算样本距离时令所有属性权重相等。在此基础上,综合考虑有序型分类数据中属性值的顺序关系、无序型分类数据中不同属性值之间的相似性以及各属性之间的关系等,提出一种更加适用于混合型分类数据的改进聚类算法,该算法对无序型分类数据和有序型分类数据采用不同的距离度量,并且用平均熵赋予相应的权重。实验结果表明,改进算法在人工数据集和真实数据集上均有比K-modes算法及其改进算法更好的聚类效果。
The K-modes algorithm is a traditional clustering technique, which uses a simple matching method to calculate the distance of different attribute values within one, while the weights of all attributes are the same. Taking this into account, the paper gives a new improved clustering algorithm. The new algorithm is more suitable for mixed categorical data by considering the sequential relation of attribute values in orderly categorical data, and the similarity between different attribute values in disordered categorical data and the relationship between attributes. The new algorithm deals with orderly categorical data and disordered categorical data by using different distance measurements. Moreover, the weights of attributes are given by average entropy. The experimental results show that the algorithm presented has better performance than the K-modes algorithm and its improved algorithm in both the artificial data set and the real data set.
作者
林强
唐加山
LIN Qiang;TANG Jiashan(College of Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China)
出处
《计算机工程与应用》
CSCD
北大核心
2019年第1期168-173,共6页
Computer Engineering and Applications