摘要
针对k-modes方法未考虑各属性值在属性空间的分布特征而导致分类变量间差异性度量不准确的问题,提出了一种基于属性值权重的k-modes聚类分析算法。该算法利用属性值之间的差异和属性值的权重,重新定义了相异度度量公式;采用属性值频率和各属性值的权重,给出一种聚类中心更新迭代公式,有效地体现了属性值在属性空间中的分布特征和属性之间的重要性差异;采用UCI数据集,验证了算法的有效性。
Aiming at the problem that the k-modes method does not consider the distribution characteristics of each attribute value in the attribute space,which leads to the inaccurate measurement of the difference between categorical variables,a k-modes clustering analysis algorithm based on attribute value weights is proposed.The algorithm uses the difference between attribute values and the weight of the attribute value to redefine the dissimilarity measurement formula,adopts the frequency of the attribute value and the weight of each attribute value to give an iterative formula for updating cluster centers,which effectively reflects the distribution characteristics of attribute values in the attribute space and the importance difference between attributes.UCI data set is used to verify the effectiveness of the algorithm.
作者
郝荣丽
胡立华
HAO Rongli;HU Lihua(School of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan 030024)
出处
《计算机与数字工程》
2023年第5期1001-1004,1119,共5页
Computer & Digital Engineering
关键词
聚类分析
k-modes
属性值权重
属性值频率
相异度度量
clustering analysis
k-modes
attribute value weight
attribute value frequency
dissimilarity measure