期刊文献+

一种适用于混合型分类数据的聚类算法 被引量:5

Clustering Algorithm for Mixed Categorical Data
下载PDF
导出
摘要 传统的K-modes算法采用简单的属性匹配方式计算同一属性下不同属性值的距离,并且计算样本距离时令所有属性权重相等。在此基础上,综合考虑有序型分类数据中属性值的顺序关系、无序型分类数据中不同属性值之间的相似性以及各属性之间的关系等,提出一种更加适用于混合型分类数据的改进聚类算法,该算法对无序型分类数据和有序型分类数据采用不同的距离度量,并且用平均熵赋予相应的权重。实验结果表明,改进算法在人工数据集和真实数据集上均有比K-modes算法及其改进算法更好的聚类效果。 The K-modes algorithm is a traditional clustering technique, which uses a simple matching method to calculate the distance of different attribute values within one, while the weights of all attributes are the same. Taking this into account, the paper gives a new improved clustering algorithm. The new algorithm is more suitable for mixed categorical data by considering the sequential relation of attribute values in orderly categorical data, and the similarity between different attribute values in disordered categorical data and the relationship between attributes. The new algorithm deals with orderly categorical data and disordered categorical data by using different distance measurements. Moreover, the weights of attributes are given by average entropy. The experimental results show that the algorithm presented has better performance than the K-modes algorithm and its improved algorithm in both the artificial data set and the real data set.
作者 林强 唐加山 LIN Qiang;TANG Jiashan(College of Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China)
出处 《计算机工程与应用》 CSCD 北大核心 2019年第1期168-173,共6页 Computer Engineering and Applications
关键词 聚类算法 混合型分类数据 距离度量 K-modes算法 clustering algorithm mixed categorical data distance metric K-modes algorithm
  • 相关文献

参考文献7

二级参考文献71

  • 1张敏,于剑.基于划分的模糊聚类算法[J].软件学报,2004,15(6):858-868. 被引量:176
  • 2诸克军,苏顺华,黎金玲.模糊C-均值中的最优聚类与最佳聚类数[J].系统工程理论与实践,2005,25(3):52-61. 被引量:69
  • 3王娟,慈林林,姚康泽.特征选择方法综述[J].计算机工程与科学,2005,27(12):68-71. 被引量:64
  • 4陈宗海,文锋,聂建斌,吴晓曙.基于节点生长k-均值聚类算法的强化学习方法[J].计算机研究与发展,2006,43(4):661-666. 被引量:13
  • 5纪良浩,王国胤,杨勇.基于协作过滤的Web日志数据预处理研究[J].重庆邮电学院学报(自然科学版),2006,18(5):646-649. 被引量:9
  • 6Han Jiawei,Kamber M. Data Mining:Concepts and Techniques. San Francisco, US: Morgan Kaufmann, 2001
  • 7MacQueen J B. Some methods for classification and analysis of multivariate observation//Proceeding 5^th Berkley Symposium, on Mathematical Statistics and Probability. 1967, I:281-297. University of California Press, 1967, Xvii, 666
  • 8Huang Zhexue. Clustering Large Data Sets with Mixed Numeric and Categorical Values//PAKDD'97. Singapore, World Scientific, 1997:21-35
  • 9Huang Zhexue. Extensions to the k Means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, 1998,2 : 283-304
  • 10Michael K, Ng M, Li Junjie, et al. On the impact of dissimilarity measure in K-Modes clustering algorithm. IEEE Transaction on Pattern Analysis and Machine Intelligence, 2007,29 (3) : 503-507

共引文献89

同被引文献21

引证文献5

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部