期刊文献+

面向高维的共享近邻聚类内部指标 被引量:1

Shared nearest-neighbor-based clustering internal index for high-dimensional datasets
下载PDF
导出
摘要 针对因使用基于距离的相似性度量,传统聚类内部指标随着数据维数的增加而性能下降的问题,提出了一种基于共享近邻相似度的聚类内部指标。首先,利用共享近邻相似度和k最近邻(kNN)方法,估计数据点的密度,构建融合密度的共享近邻相似度图。然后,根据融合密度的共享近邻相似度图,利用最大流算法,计算出类内相似度和类间分离度,并结合两者计算出聚类内部指标。通过对人工数据集和真实数据集的测试表明,与9个基于距离的传统聚类内部指标相比,该指标能更准确评估数据集的最佳划分和预测数据集的最佳类数。因此,该指标处理复杂类结构和高维数据的能力优于所对比的其他聚类内部指标。 In the use of distance-based similarity measures,the performance of traditional clustering internal indicators decreases with the increase of data dimensionality.To address this problem,a clustering internal index based on Shared Nearest-Neighbor similarity(SNN)was proposed.Firstly,the shared nearest neighbor similarity and k-Nearest Neighbor(kNN)method were used to estimate the density of the data points and construct a density-involved shared nearest neighbor similarity graph.Then,according to this similarity graph,intra-cluster compactness and inter-cluster separation were defined by a maximum flow algorithm and the clustering internal index was calculated.Compared with nine traditional clustering internal indexes,the experimental results on artificial datasets and real datasets show that this index can recognize the optimal partition of datasets more effectively and predict the optimal class number more accurately.Therefore,when dealing with high dimensional datasets and those with complex cluster structures,the proposed index has better performance than the other internal validity indexes.
作者 张龙义 钟才明 ZHANG Longyi;ZHONG Caiming(College of Information Science and Engineering,Ningbo University,Ningbo Zhejiang 315210,China;College of Science and Technology,Ningbo University,Ningbo Zhejiang 315210,China)
出处 《计算机应用》 CSCD 北大核心 2021年第S01期93-100,共8页 journal of Computer Applications
基金 国家自然科学基金面上项目(61976134)。
关键词 聚类内部指标 聚类 共享近邻相似度 高维诅咒 有效性指标 clustering internal index clustering Shared Nearest-Neighbor similarity(SNN) curse of dimensionality validity index
  • 相关文献

参考文献2

二级参考文献6

  • 1Guha S,Rastogi R,Shim K.Cure:An efficient clustering algorithm for large databases[C]//1998 ACM-SIGMOD Int.Conf.Management of Data (SIGMOD'98),seattle WA.USA:1998:73-84.
  • 2Ertoz L,Michael,S,Vipin Kumar.A new shared nearest neighbor clustering algorithm and its applications[C]//Workshop on Clustering High Dimensional Data and its Applications,Second SIAM International Conference on Data Mining,Arlington,VA,USA:2002.
  • 3Ertoz L,Michael S,Vipin Kumar.Finding Clusters of Different Sizes,Shapes,and Densities in Noisy,High Dimensional Data[C].//Proceedings of Third SIAM International Conference on Data Mining,San Francisco,CA,USA:2003.
  • 4Stephen D B,Mark S.Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule[C]//Conference on Knowledge Discovery in Data archive Proceedings of the ninth ACM SIGKDD International Conference (KDD),29-38,Washington,USA:2003:29-38.
  • 5刘馨月,李静伟,于红,尤全增,林鸿飞.基于共享近邻的自适应谱聚类[J].小型微型计算机系统,2011,32(9):1876-1880. 被引量:15
  • 6吴健,崔志明,时玉杰,盛胜利,龚声蓉.基于局部密度构造相似矩阵的谱聚类算法[J].通信学报,2013,34(3):14-22. 被引量:14

共引文献6

同被引文献18

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部