期刊文献+

一种可重叠子空间K-Means聚类算法 被引量:5

An Overlapping Subspace K-Means Clustering Algorithm
下载PDF
导出
摘要 现有聚类算法面向高维稀疏数据时多数未考虑类簇可重叠和离群点的存在,导致聚类效果不理想。为此,提出一种可重叠子空间K-Means聚类算法。设计类簇子空间计算策略,在聚类过程中动态更新每个类簇的属性子空间,并定义合理的约束函数指导聚类过程,从而实现类簇的可重叠性与离群点的控制。在此基础上定义合理的目标函数对传统K-Means算法进行修正,利用熵权约束分别计算每个类簇中各维度的权重,使用权重值标识不同类簇中维度的相对重要性,并加入控制重叠程度和离群值数量的参数。在人工数据集和真实数据集上的实验结果表明,该算法在NMI、F1指标上均优于EWKM、NEO-K-Means、OKM等子空间聚类算法,具有更好的聚类结果。 Most of existing clustering algorithms for high-dimensional sparse data do not consider overlapping class clusters and outliers,resulting in unsatisfactory clustering results.Therefore,this paper proposes an overlapping subspace K-Means clustering algorithm.The computing strategy for class cluster subspace is given.The attribute subspace of each class cluster is dynamically updated in the clustering process,and a reasonable constraint function is defined to guide the clustering process,so as to realize the overlap of clusters and the control of outliers.On this basis,a reasonable objective function is defined to modify the traditional K-Means algorithm,and the weight of each dimension in each class cluster is calculated by using the entropy weight constraint.The value of weight is used to identify the relative importance of the dimensions in different class clusters.And some parameters are added to control the degree of overlap and the number of outliers.Experimental results on artificial data set and real data set show that the proposed algorithm outperforms EWKM,NEO-K-Means,OKM and other subspace clustering algorithms in terms of NMI and F1 indicators with better clustering results.
作者 刘宇航 马慧芳 刘海姣 余丽 LIU Yuhang;MA Huifang;LIU Haijiao;YU Li(College of Computer Science and Engineering,Northwest Normal University,Lanzhou 730070,China;Guangxi Key Laboratory of Trusted Software,Guilin University of Electronic Technology,Guilin,Guangxi 541004,China)
出处 《计算机工程》 CAS CSCD 北大核心 2020年第8期58-63,71,共7页 Computer Engineering
基金 国家自然科学基金(61762078,61363058) 广西可信软件重点实验室研究课题(kx202003) 广西多源信息挖掘与安全重点实验室开放基金(MIMS18-08) 西北师范大学2019年度青年教师科研能力提升计划重大项目(NWNU-LKQN2019-2)。
关键词 目标函数 子空间聚类 离群点 熵权约束 K-MEANS聚类算法 objective function subspace clustering outlier entropy weight constraint K-Means clustering algorithm
  • 相关文献

参考文献2

二级参考文献13

  • 1Anil K J. Data clustering:50 years beyond K-Means[J].Pattern Recognition Letters,2010,(08):651-666.
  • 2Likas A,Vlassis M,Verbeek J. The global K-means clustering algorithm[J].Pattern Recognition,2003,(02):451-461.doi:10.1016/S0031-3203(02)00060-2.
  • 3Selim S Z,Al-Sultan K S. Analysis of global K-means,an incremental heuristic for minimum sum-of-squares clustering[J].Journal of Classification,2005,(22):287-310.
  • 4Bellman R,Dreyfus S. Applied dynamic programming[M].Princeton,New Jersey:Princeton University Press,1962.
  • 5Aloise D,Deshpande A,Hansen P. NP-hardness of euclidean sum-of-squares clustering[J].Machine Learning,2009,(02):245-248.
  • 6Mahajan M,Nimbor P,Varadarajan K. The planar K-means problem is NP-hard[J].Lecture Notes in Computer Science,2009,(5431):274-285.
  • 7Ball G,Hall D. ISODATA,a novel method of data analysis and pattern classification[Technical rept. NTIS AD 699616. ][M].California:Stanford Research Institute,1965.
  • 8WANG Cheng,LI Jiao-jiao,BAI Jun-qing. Max-Min K- means Clustering Algorithm and Application in Post-processing of Scientific Computing[A].Napoli,2011.7-9.
  • 9Pena J M,Lozano J A,Larranaga P. An empirical comparison of four initialization methods for the K-means algorithm[J].Pattern Recognition Letters,1999,(20):1027-1040.doi:10.1016/S0167-8655(99)00069-0.
  • 10Lai J Z C,Tsung-Jen H. Fast global K-means clustering using cluster membership and inequality[J].Pattern Recogni- tion,2010,(43):1954-1963.

共引文献312

同被引文献64

引证文献5

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部