期刊文献+

一种基于属性值权重的k-modes聚类分析算法 被引量:1

A K-modes Clustering Algorithm Based on Attribute Value Weight
下载PDF
导出
摘要 针对k-modes方法未考虑各属性值在属性空间的分布特征而导致分类变量间差异性度量不准确的问题,提出了一种基于属性值权重的k-modes聚类分析算法。该算法利用属性值之间的差异和属性值的权重,重新定义了相异度度量公式;采用属性值频率和各属性值的权重,给出一种聚类中心更新迭代公式,有效地体现了属性值在属性空间中的分布特征和属性之间的重要性差异;采用UCI数据集,验证了算法的有效性。 Aiming at the problem that the k-modes method does not consider the distribution characteristics of each attribute value in the attribute space,which leads to the inaccurate measurement of the difference between categorical variables,a k-modes clustering analysis algorithm based on attribute value weights is proposed.The algorithm uses the difference between attribute values and the weight of the attribute value to redefine the dissimilarity measurement formula,adopts the frequency of the attribute value and the weight of each attribute value to give an iterative formula for updating cluster centers,which effectively reflects the distribution characteristics of attribute values in the attribute space and the importance difference between attributes.UCI data set is used to verify the effectiveness of the algorithm.
作者 郝荣丽 胡立华 HAO Rongli;HU Lihua(School of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan 030024)
出处 《计算机与数字工程》 2023年第5期1001-1004,1119,共5页 Computer & Digital Engineering
关键词 聚类分析 k-modes 属性值权重 属性值频率 相异度度量 clustering analysis k-modes attribute value weight attribute value frequency dissimilarity measure
  • 相关文献

参考文献13

二级参考文献90

  • 1张敏,于剑.基于划分的模糊聚类算法[J].软件学报,2004,15(6):858-868. 被引量:176
  • 2陆声链,林士敏.基于距离的孤立点检测研究[J].计算机工程与应用,2004,40(33):73-75. 被引量:44
  • 3黄添强,秦小麟,叶飞跃.基于方形邻域的离群点查找新方法[J].控制与决策,2006,21(5):541-545. 被引量:16
  • 4孟伟,韩学东,洪炳镕.蜜蜂进化型遗传算法[J].电子学报,2006,34(7):1294-1300. 被引量:78
  • 5李业丽,秦臻.一种改进的k-means算法[J].北京印刷学院学报,2007,15(2):63-65. 被引量:9
  • 6Han Jiawei,Kamber M. Data Mining:Concepts and Techniques. San Francisco, US: Morgan Kaufmann, 2001
  • 7MacQueen J B. Some methods for classification and analysis of multivariate observation//Proceeding 5^th Berkley Symposium, on Mathematical Statistics and Probability. 1967, I:281-297. University of California Press, 1967, Xvii, 666
  • 8Huang Zhexue. Clustering Large Data Sets with Mixed Numeric and Categorical Values//PAKDD'97. Singapore, World Scientific, 1997:21-35
  • 9Huang Zhexue. Extensions to the k Means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, 1998,2 : 283-304
  • 10Michael K, Ng M, Li Junjie, et al. On the impact of dissimilarity measure in K-Modes clustering algorithm. IEEE Transaction on Pattern Analysis and Machine Intelligence, 2007,29 (3) : 503-507

共引文献208

同被引文献4

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部