期刊文献+

基于平均差异度的改进k-prototypes聚类算法 被引量:4

Improved k-prototypes clustering algorithm based on average difference degree
下载PDF
导出
摘要 针对k-prototypes聚类算法随机选取初始聚类中心导致聚类结果不稳定,以及现有的大多数混合属性数据聚类算法聚类质量不高等问题,提出了基于平均差异度的改进k-prototypes聚类算法.通过利用平均差异度选取初始聚类中心,避免了初始聚类中心点选取的随机性,同时利用信息熵确定数值数据的属性权重,并对分类属性度量公式进行改进,给出了一种混合属性数据度量公式.结果表明,改进后的算法具有较高的准确率,能够有效处理混合属性数据. In order to solve the problem that the random selection of initial cluster centers for the k-prototypes clustering algorithm brings about unstable clustering results and that the clustering quality of most currently existing clustering algorithms for mixed attribute data is not high,an improved k-prototypes algorithm based on average difference degree was proposed.Through using the average difference degree,the initial clustering centers were selected to avoid the selection randomness of initial clustering center points.In addition,the attribute weights of numerical data were determined by the information entropy,the metric formula of categorical attribute was improved,and a metric formula for the mixed attribute data was given.The results show that the improved algorithm can achieve better accuracy and can effectively process the data of mixed attribute.
作者 石鸿雁 徐明明 SHI Hong-yan;XU Ming-ming(School of Science,Shenyang University of Technology,Shenyang 110870,China)
出处 《沈阳工业大学学报》 EI CAS 北大核心 2019年第5期555-559,共5页 Journal of Shenyang University of Technology
基金 国家自然科学基金资助项目(61074005)
关键词 k-prototypes算法 聚类 初始聚类中心 混合属性数据 平均差异度 信息熵 属性权重 度量公式 k-prototypes algorithm clustering initial clustering center mixed attribute data average difference degree information entropy attribute weight metric formula
  • 相关文献

参考文献7

二级参考文献90

  • 1陈向东,李平.基于色彩特征的CAMSHIFT视频图像汽车流量检测[J].沈阳工业大学学报,2015,37(2):183-188. 被引量:9
  • 2赵宇,李兵,李秀,刘文煌,任守榘.混合属性数据聚类融合算法[J].清华大学学报(自然科学版),2006,46(10):1673-1676. 被引量:9
  • 3Xu Rui,Wunsch II Donald. Survey of clustering algorithms [ J ]. IEEE Transactions on Neural Networks,2005,16 ( 3 ) :645 - 678.
  • 4Hart Jiawei, Kamber Micheline,Pei Jian. Data Mining: Concepts and Tech- niques[ M ]. 3rd ed. San Francisco :Morgan Kaufmann,2012.
  • 5Macqueen J. Some methods for classification and analysis of multivari- ate observations[ C ]//Proceeding of the 5th Berkeley Symposium on Mathematical Statistics and Probability Berkeley, 1967.
  • 6Huang Zhexue. A fast clustering algorithm to cluster very large categori- cal data sets in data mining[ C ]//Proceedings of the SIGMOD Work- shop on Research Issues on Data Mining and Knowledge Discovery Canada, 1997.
  • 7Huang Zhexue. Extensions to the K-means algorithm for clustering large data sets with categorical values [ J ]. Data Mining and Knowledge Dis- covery, 1998,2 : 283 - 304.
  • 8Li Cen, Biswas Gautam. Unsupervised learning with mixed numeric and nominal data[ J ]. IEEE Transactions on Knowledge and Data Engineer- ing,2002,14 (4) :673 - 690.
  • 9He Zengyou, Xu Xiaofei, Deng Shengchun. Clustering mixed numeric and categorical data:A cluster ensemble approach [ EB/OL]. 2005. ht- tp ://arxiv. org/pdf/cs. AI/0509011.
  • 10Hubert L, Arabic P. Comparing partitions [J].Journal of Classification, 1985,2(1) :193 -218.

共引文献63

同被引文献55

引证文献4

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部