期刊文献+

基于密度峰值算法的三支聚类

Three-way clustering based on the density peak algorithm
下载PDF
导出
摘要 传统的硬聚类要求聚类结果之间必须边界清晰,但在实际划分中常常会遇到信息不充分的情况,如果将对象强行划分到某一类簇,会带来较高的决策风险.为了解决这个问题,三支聚类采用核心域、边界域、琐碎域来表示每一个类簇,把不确定的对象放入边界域中延迟决策,以此降低决策风险,同时针对传统密度峰值算法需要手动选取参数且样本分配策略存在缺陷,提出了基于密度峰值算法的三支聚类,引入自然最近邻算法,自适应地获取每个点的邻居个数以此来定义样本的局部密度,然后基于三支阈值得到核心域和边界域,对于边界域中的数据,通过比较其与聚类中心的距离和密度做进一步划分.在UCI、Synthetic和Shape数据集上的实验结果证明:所提算法能有效提高ACC,ARI,AMI,FMI值,可以显著提升聚类效果. Traditional hard clustering requires clear boundaries between clustering results.However,in the actual division scenario,insufficient information is often encountered.Decision-making risk will be increased,if the object is forcibly divided into a certain type of cluster.In order to solve this problem,three-way clustering uses core region,boundary region,and trivial region to represent a cluster.Uncertain objects are placed in the boundary domain for delayed decision-making,aiming to reduce the risk of decision-making.Simultaneously,the need to manually select parameters and the unreasonable sample allocation strategy constitutes the shortcomings of the traditional density peak algorithm.Based on this,a three-way clustering based on the density peak algorithm is proposed.First,the natural nearest neighbor algorithm is introduced to adaptively obtain the number of neighbors of each point to define the local density of the sample.Then the core domain and boundary domain is obtained based on three thresholds.The data in the boundary domain is further divided by comparing its distance and density from the cluster center.Experiments on the UCI,Synthetic and Shape datasets show that the proposed algorithm can effectively increase the value of ACC,ARI,AMI,FMI and significantly improve the clustering effect.
作者 姜冬勤 王平心 杨习贝 JIANG Dongqin;WANG Pingxin;YANG Xibei(School of Computer,Jiangsu University of Science and Technology,Zhenjiang 212100,China;School of Science,Jiangsu University of Science and Technology,Zhenjiang 212100,China)
出处 《江苏科技大学学报(自然科学版)》 CAS 北大核心 2023年第4期72-79,共8页 Journal of Jiangsu University of Science and Technology:Natural Science Edition
基金 国家自然科学基金资助项目(62076111,61773012) 江苏省高校自然科学基金资助项目(15KJB110004)。
关键词 三支聚类 密度峰值算法 自然最近邻 three-way clustering density peak algorithm natural nearest neighbor
  • 相关文献

参考文献4

二级参考文献39

  • 1王玲,薄列峰,焦李成.密度敏感的谱聚类[J].电子学报,2007,35(8):1577-1581. 被引量:61
  • 2Han J W, Kamber M. Data Mining Concepts and Techniques. 2nd ed. New York:Elsevier Inc, 2006. 383-424.
  • 3Jain A K. Data clustering:50 years beyond K-means. Pattern Recogn Lett, 2010, 31:651-666.
  • 4Williamson B, Guyon I. Clustering:science or art?. J Mach Learn Res, 2012, 27:65-80.
  • 5Frey B J, Dueck D. Clustering by passing messages between data points. Science, 2007, 315:972-976.
  • 6Rodri?uez A, Laio A. Clustering by fast search and find of density peaks. Science, 2014, 344:1492-1496.
  • 7Xu R, Wunsch D. Survey of clustering algorithms. IEEE Trans Neural Netw Learn Syst, 2005, 16:645-678.
  • 8McQueen J. Some methods for classification and analysis of multivariate observations. In:Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. Los Angeles:University of California, 1967. 281-297.
  • 9Likas A, Vlassis N, Verbeek J J. The global K-means clustering algorithm. Pattern Recogn, 2003, 36:451-464.
  • 10Xie J Y, Jiang S, Xie W, et al. An efficient global K-means clustering algorithm. J Comput, 2011, 6:271-279.

共引文献131

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部