识别聚类间远近关系的双几何体模型被引量：2

Geometric double-entity model for recognizing far-near relations of clusters

导出

摘要许多实际问题的解决不仅需要聚类算法给出类标,更依赖于类间远近关系的辨别.对于类数较多且高维数据的困难情况,基于降维的聚类结果可视化方法通常会出现聚类的重叠、交织或强行拉远现象,使得一些类间的远近关系无法分辨或被错误显示;而现有的类间距离方法则不能揭示两个聚类是远离还是靠近.本文提出了双几何体模型方法来描述两个聚类的类间关系,并设计了相对边界距离、绝对边界距离和区域疏密程度等测量类间远近程度的方法.本文方法既考虑了两个聚类的最近样本集之间的绝对距离,也考虑了聚类边界区域的疏密程度,其优点是在上述困难情况下也能准确揭示高维空间中的类间关系.对真实数据集的实验结果表明,双几何体模型方法能有效地识别现有聚类可视化方法无法辨别的类间远近关系. When solving many practical problems, we not only need sample labels given by a clustering algo- rithm, but also rely on the recognition of far-near relations of clusters. Under the difficult condition of many clusters in a high-dimensional data set, the clustering visualization methods based on dimension reductions usu- ally produce the phenomena, e.g., some clusters are overlapping, interlacing, or pushed away; as a result, the far-near relations of some clusters are displayed wrongly or cannot be distinguished. The existing inter-cluster distance methods cannot determine whether two clusters are far away or near. The geometric double-entity model method （GDEM） is proposed to describe far-near relations of clusters, and the methods such as the rela- tive border distance, absolute border distance and region dense degree are designed to measure far-near degrees between clusters. GDEM pays attention to both the absolute distance between nearest sample sets and the dense degrees of border regions of two clusters, and it is able to uncover accurately far-near relations of clusters in a high-dimensional space, especially under the difficult condition mentioned above. The experimental results on four real data sets show that the proposed method can effectively recognize far-near relations of clusters, while the conventional methods cannot.

作者王开军严宣辉陈黎飞 WANG KaiJun;YAN XuanHui;CHEN LiFei(School of Mathematics and Computer Science,Fujian Normal University,Fuzhou 350108,China)

机构地区福建师范大学数学与计算机科学学院

出处《中国科学：信息科学》 CSCD 2012年第1期99-110,共12页 Scientia Sinica(Informationis)

基金福建省教育厅A类资助项目(批准号:JA09043) 福建省省属高校科研专项(批准号:JK2009006)资助项目

关键词双几何体模型聚类间远近关系大类数高维数据划分聚类算法 geometric double-entity model, far-near relations of clusters, many clusters, high-dimensional dataset, partitional clustering algorithms

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献2

1王开军,张军英,李丹,张新娜,郭涛.自适应仿射传播聚类[J].自动化学报,2007,33(12):1242-1246. 被引量：145
2詹德川,周志华.基于集成的流形学习可视化[J].计算机研究与发展,2005,42(9):1533-1537. 被引量：24

二级参考文献19

1R.O. Duda, P. E. Hart, D. G. Stork. Pattern Classification.New York: John Wiley & Sons, 2001.
2V.D. Silva, J. B. Tenenbaum. Global versus local methods in nonlinear dimensionality reduction. In: Advances of Neural Information Proceeding Systems 15. Cambridge, MA: MIT Press, 2002. 705～712.
3J.B. Tenenbaum, V. de Silva, J. C. Langford. A global geometric framework for nonlinear dimensionality reduction.Science, 2000, 90(5500): 2319～2323.
4S. T. Roweis, K. S. Lawrance. Nonlinear dimensionality reduction by locally linear embedding. Science, 2000, 290(5500): 2323～2326.
5M. Belkin, P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 2001,151(6): 1373～1396.
6V.N. Vapnik. Statistical Learning Theory. New York: John Wiley & Sons, 1998.
7T.G. Dietterich. Ensemble learning. In: The Handbook of Brain Theory and Neural Networks, 2nd Edition. Cambridge, MA:MIT Press, 2002.
8M. Balasubramanian, E. L. Schwartz, J. B. Tenenbaum. The Isomap algorithm and topological stability. Science, 2002, 295(4): 7a.
9Frey B J, Dueck D. Clustering by passing messages between data points. Science, 2007, 315(5814): 972-976
10Kelly K. Affinity program slashes computing times [Online], available: http://www.news.utoronto.ca/bin6/070215-2952. asp. October 25, 2007

共引文献166

1常瑞花.基于密集度量元的近邻传播聚类算法[J].微电子学与计算机,2015,32(5):1-5. 被引量：1
2黄启宏,刘钊.流形学习中非线性维数约简方法概述[J].计算机应用研究,2007,24(11):19-25. 被引量：24
3齐玮,李夕海,刘代志.基于Isomap的核爆地震模式识别[J].核电子学与探测技术,2008,28(2):434-439.
4曾宪华,罗四维.局部保持的流形学习算法对比研究[J].计算机工程与应用,2008,44(29):1-7. 被引量：4
5王自强,钱旭,孔敏.流形学习算法综述[J].计算机工程与应用,2008,44(35):9-12. 被引量：23
6杨辉华,覃锋,王义明,罗国安.NIR光谱的Isomap-PLS非线性建模方法[J].光谱学与光谱分析,2009,29(2):322-326. 被引量：20
7曾宪华,罗四维,王娇,赵嘉莉.基于测地线距离的广义高斯型Laplacian特征映射[J].软件学报,2009,20(4):815-824. 被引量：9
8高小方.流形学习方法中的若干问题分析[J].计算机科学,2009,36(4):25-28. 被引量：15
9张仁彦,赵洪亮,卢晓,曹茂永.基于相似性传播聚类的灰度图像分割[J].海军工程大学学报,2009,21(3):33-37. 被引量：4
10茅赵阳.图像的聚类和可视化方法研究[J].现代计算机,2009,15(7):71-73. 被引量：1

同被引文献30

1蒋伟进,张莲梅,王璞.基于MAS协作机制的动态计算资源优化调度模型[J].中国科学（F辑:信息科学）,2009,39(9):977-989. 被引量：6
2王玲,薄列峰,焦李成.密度敏感的半监督谱聚类[J].软件学报,2007,18(10):2412-2422. 被引量：94
3Klein D, From instance level constraints to space-level constraints : Making the most of prior knowledge in data clustering [ C ] .The 19th International Conference on Machine Learning( ICML 2002), 2002.
4Basu S, Banerjee A. Active semi-supervised for pairwise constrained clustering [ C ]. The 4th SIAM International Conference on Data Mining SIAM,2004.
5Wagstaff K, Cardie C. Clustering with instance-level constraints [ C ]//Pat L, ed. Proceeding of the 17th Int' 1 Conference on Machine Learning( ICML 2000).Stanford : Morgan Kaufmann Publishers, 2000 : 1103-1110.
6Wagstaff K, Cardie C, Rogers S, et al. Constrained K-means clustering with background knowledge [ C ]//Carla E B, Andrea P D, eds. Processing of the 18th Int' 1 Conference on Machine Learning ( ICML 2001 ).Williamstown : Morgan Kanfmann Publishers, 2001 : 577-584.
7Burr S. Active learning literature survey, computer sciences technical report 1648 [ R]. University of Wisconsin- Madison, 2010.
8Davidson I, Wagstaff K. Measuring constraint-set utility for partitional clustering algorithms, PKDD2006, LNAI 4213, 2006,115-125.
9Andreas V.A stopping criterion for active learning [ J ]. Computer, Speech and Language, 2008,22 ( 3 ) : 295 -312.
10Mikhail B, Sugato B, Raymond J M. Integrating constraints and metric learning in semi-supervised clustering [ C ]// Proceeding of the 21 st International Conference on Machine Learning.Banff, Canada: ICML, 2004 : 81-88.

引证文献2

1蒋伟进,许宇晖,王欣.基于成对约束的主动学习半监督谱聚类[J].系统科学与数学,2013,33(6):708-723. 被引量：2
2蒋伟进,许宇晖,郭宏,许宇胜,王欣.基于成对约束的主动学习半监督聚类算法[J].应用基础与工程科学学报,2014,22(6):1248-1261. 被引量：3

二级引证文献5

1唐校辉,廖欣,陈雷霆,陈文昭.基于改进Tri-Training算法的健康大数据分类模型研究[J].现代计算机（中旬刊）,2017(7):21-25. 被引量：2
2安强强,张峰,李赵兴,张雅琼.基于机器学习的图像分割研究[J].自动化与仪器仪表,2018,0(6):29-31. 被引量：4
3杨颖青,赵凤.应用于彩色图像分割的半监督多目标进化聚类算法[J].计算机应用研究,2018,35(10):3126-3129. 被引量：2
4Weijin Jiang,Yang Wang,Yirong Jiang,Jiahui Chen,Yuhui Xu,Lina Tan.Research on Mobile Internet Mobile Agent System Dynamic Trust Model for Cloud Computing[J].China Communications,2019,16(7):174-194. 被引量：5
5肖成龙,张重鹏,王珊珊,张睿,王万里,魏宪.基于流形正则化与成对约束的深度半监督谱聚类算法[J].系统科学与数学,2020,40(8):1325-1341. 被引量：5

1申晓华,杨国胜,张焕龙.改进的基于区域能量的图像融合方法[J].弹箭与制导学报,2006,26(4):279-281. 被引量：6
2万小军,杨建武,陈晓鸥.文档聚类中k-means算法的一种改进算法[J].计算机工程,2003,29(2):102-103. 被引量：29
3王冲,雷秀娟.新的小生境萤火虫划分聚类算法[J].计算机工程,2014,40(5):173-177. 被引量：7
4Q来A去[J].电脑知识与技术（经验技巧）,2015,0(2):53-58.
5孟娜娜,徐振明.一种基于划分的无监督优化算法[J].计算机工程,2011,37(S1):168-170. 被引量：1
6张新红,张帆,张军亮.一种改进的二值图像质量评价方法[J].计算机工程与科学,2010,32(6):52-54. 被引量：3
7蔡静颖.特征文本提取的网络社团划分聚类算法[J].微计算机信息,2012,28(6):182-183.
8疑难解答[J].电子制作．电脑维护与应用,2005(8):64-64.
9胥素芳,王俞.基于空间绝对距离的机器人自标定研究[J].制造业自动化,2015,37(9):96-98. 被引量：1
10刘雷,王洪国,邵增珍,尹会娟.一种基于蜂群原理的划分聚类算法[J].计算机应用研究,2011,28(5):1699-1702. 被引量：6

中国科学：信息科学

2012年第1期

浏览历史

内容加载中请稍等...

识别聚类间远近关系的双几何体模型被引量：2

参考文献2

二级参考文献19

共引文献166

同被引文献30

引证文献2

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

识别聚类间远近关系的双几何体模型 被引量：2

参考文献2

二级参考文献19

共引文献166

同被引文献30

引证文献2

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

识别聚类间远近关系的双几何体模型被引量：2