期刊文献+

基于共享近邻相似度的密度峰聚类算法 被引量:8

Density peaks clustering algorithm based on shared near neighbors similarity
下载PDF
导出
摘要 密度峰聚类是一种基于密度的高效聚类方法,但存在对全局参数dc敏感和需要人工干预决策图进行聚类中心选择的缺陷。针对上述问题,提出了一种基于共享近邻相似度的密度峰聚类算法。首先,该算法结合欧氏距离和共享近邻相似度进行样本局部密度的定义,避免了原始密度峰聚类算法中参数dc的设置;其次,优化聚类中心的选择过程,能够自适应地进行聚类中心的选择;最后,将样本分配至距其最近并拥有较高密度的样本所在的簇中。实验结果表明,在UCI数据集和模拟数据集上,该算法与原始的密度峰聚类算法相比,准确率、标准化互信息(NMI)和F-Measure指标分别平均提高约22.3%、35.7%和16.6%。该算法能有效地提高聚类的准确性和聚类结果的质量。 Density peaks clustering is an efficient density-based clustering algorithm. However, it is sensitive to the global parameter dc. Furthermore, artificial intervention is needed for decision graph to select clustering centers. To solve these problems, a new density peaks clustering algorithm based on shared near neighbors similarity was proposed. Firstly, the Euclidean distance and shared near neighbors similarity were combined to define the local density of a sample, which avoided the setting of parameter dcof the original density peaks clustering algorithm. Secondly, the selection process of clustering centers was optimized to select initial clustering centers adaptively. Finally, each sample was assigned to the cluster as its nearest neighbor with higher density samples. The experimental results show that, compared with the original density peaks clustering algorithm on the UCI datasets and the artificial datasets, the average values of accuracy, Normalized Mutual Information(NMI) and F-Measure of the proposed algorithm are respectively increased by about 22. 3%, 35. 7% and16. 6%. The proposed algorithm can effectively improve the accuracy of clustering and the quality of clustering results.
作者 鲍舒婷 孙丽萍 郑孝遥 郭良敏 BAO Shuting 1,2 , SUN Liping 1,2 ,ZHENG Xiaoyao 1,2 ,GUO Liangmin 1,2(1. School of Computer and Information, Anhui Normal University, Wuhu Anhui 241002, China;2. Anhui Provincial Key Laboratory of Network and Information Security ( Anhui Normal University ) , Wuhu Anhui 241002, China)
出处 《计算机应用》 CSCD 北大核心 2018年第6期1601-1607,共7页 journal of Computer Applications
基金 国家自然科学基金资助项目(61602009 61772034) 安徽省自然科学基金资助项目(1608085MF145 1508085QF133)~~
关键词 密度峰聚类 K近邻 共享近邻 局部密度 相似性度量 density peaks clustering k nearest neighbors shared near neighbors local density similarity measure
  • 相关文献

参考文献5

二级参考文献49

  • 1张猛,王大玲,于戈.一种基于自动阈值发现的文本聚类方法[J].计算机研究与发展,2004,41(10):1748-1753. 被引量:16
  • 2袁方,周志勇,宋鑫.初始聚类中心优化的k-means算法[J].计算机工程,2007,33(3):65-66. 被引量:152
  • 3颜晓龙,沈鸿.一种适用于高维数据流的子空间聚类方法[J].计算机应用,2007,27(7):1680-1684. 被引量:6
  • 4BIN Liu, SHU Gui Cao, WU He. Distributed data mining for e-business[ j]. Information Technology and Management, 2011,12(2) : 67 -146.
  • 5LV Xiao, LI Yong Jie, LU Xu . A web data mining algorithm based on weighted association rules[j]. Key Engineering Materials,2011,1104(467):1368-2777.
  • 6BIN Liu, SHU Gui Cao, WU He. Distributed data mining for e-business[j]. Information Technology and Management, 2011,12(2) :67-69.
  • 7孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008(1):48-61. 被引量:1072
  • 8Han J W, Kamber M. Data Mining Concepts and Techniques. 2nd ed. New York:Elsevier Inc, 2006. 383-424.
  • 9Jain A K. Data clustering:50 years beyond K-means. Pattern Recogn Lett, 2010, 31:651-666.
  • 10Williamson B, Guyon I. Clustering:science or art?. J Mach Learn Res, 2012, 27:65-80.

共引文献165

同被引文献39

引证文献8

二级引证文献29

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部