期刊文献+

识别聚类间远近关系的双几何体模型 被引量:2

Geometric double-entity model for recognizing far-near relations of clusters
原文传递
导出
摘要 许多实际问题的解决不仅需要聚类算法给出类标,更依赖于类间远近关系的辨别.对于类数较多且高维数据的困难情况,基于降维的聚类结果可视化方法通常会出现聚类的重叠、交织或强行拉远现象,使得一些类间的远近关系无法分辨或被错误显示;而现有的类间距离方法则不能揭示两个聚类是远离还是靠近.本文提出了双几何体模型方法来描述两个聚类的类间关系,并设计了相对边界距离、绝对边界距离和区域疏密程度等测量类间远近程度的方法.本文方法既考虑了两个聚类的最近样本集之间的绝对距离,也考虑了聚类边界区域的疏密程度,其优点是在上述困难情况下也能准确揭示高维空间中的类间关系.对真实数据集的实验结果表明,双几何体模型方法能有效地识别现有聚类可视化方法无法辨别的类间远近关系. When solving many practical problems, we not only need sample labels given by a clustering algo- rithm, but also rely on the recognition of far-near relations of clusters. Under the difficult condition of many clusters in a high-dimensional data set, the clustering visualization methods based on dimension reductions usu- ally produce the phenomena, e.g., some clusters are overlapping, interlacing, or pushed away; as a result, the far-near relations of some clusters are displayed wrongly or cannot be distinguished. The existing inter-cluster distance methods cannot determine whether two clusters are far away or near. The geometric double-entity model method (GDEM) is proposed to describe far-near relations of clusters, and the methods such as the rela- tive border distance, absolute border distance and region dense degree are designed to measure far-near degrees between clusters. GDEM pays attention to both the absolute distance between nearest sample sets and the dense degrees of border regions of two clusters, and it is able to uncover accurately far-near relations of clusters in a high-dimensional space, especially under the difficult condition mentioned above. The experimental results on four real data sets show that the proposed method can effectively recognize far-near relations of clusters, while the conventional methods cannot.
作者 王开军 严宣辉 陈黎飞 WANG KaiJun;YAN XuanHui;CHEN LiFei(School of Mathematics and Computer Science,Fujian Normal University,Fuzhou 350108,China)
出处 《中国科学:信息科学》 CSCD 2012年第1期99-110,共12页 Scientia Sinica(Informationis)
基金 福建省教育厅A类资助项目(批准号:JA09043) 福建省省属高校科研专项(批准号:JK2009006)资助项目
关键词 双几何体模型 聚类间远近关系 大类数 高维数据 划分聚类算法 geometric double-entity model, far-near relations of clusters, many clusters, high-dimensional dataset, partitional clustering algorithms
  • 相关文献

参考文献2

二级参考文献19

  • 1R.O. Duda, P. E. Hart, D. G. Stork. Pattern Classification.New York: John Wiley & Sons, 2001.
  • 2V.D. Silva, J. B. Tenenbaum. Global versus local methods in nonlinear dimensionality reduction. In: Advances of Neural Information Proceeding Systems 15. Cambridge, MA: MIT Press, 2002. 705~712.
  • 3J.B. Tenenbaum, V. de Silva, J. C. Langford. A global geometric framework for nonlinear dimensionality reduction.Science, 2000, 90(5500): 2319~2323.
  • 4S. T. Roweis, K. S. Lawrance. Nonlinear dimensionality reduction by locally linear embedding. Science, 2000, 290(5500): 2323~2326.
  • 5M. Belkin, P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 2001,151(6): 1373~1396.
  • 6V.N. Vapnik. Statistical Learning Theory. New York: John Wiley & Sons, 1998.
  • 7T.G. Dietterich. Ensemble learning. In: The Handbook of Brain Theory and Neural Networks, 2nd Edition. Cambridge, MA:MIT Press, 2002.
  • 8M. Balasubramanian, E. L. Schwartz, J. B. Tenenbaum. The Isomap algorithm and topological stability. Science, 2002, 295(4): 7a.
  • 9Frey B J, Dueck D. Clustering by passing messages between data points. Science, 2007, 315(5814): 972-976
  • 10Kelly K. Affinity program slashes computing times [Online], available: http://www.news.utoronto.ca/bin6/070215-2952. asp. October 25, 2007

共引文献166

同被引文献30

  • 1蒋伟进,张莲梅,王璞.基于MAS协作机制的动态计算资源优化调度模型[J].中国科学(F辑:信息科学),2009,39(9):977-989. 被引量:6
  • 2王玲,薄列峰,焦李成.密度敏感的半监督谱聚类[J].软件学报,2007,18(10):2412-2422. 被引量:94
  • 3Klein D, From instance level constraints to space-level constraints : Making the most of prior knowledge in data clustering [ C ] .The 19th International Conference on Machine Learning( ICML 2002), 2002.
  • 4Basu S, Banerjee A. Active semi-supervised for pairwise constrained clustering [ C ]. The 4th SIAM International Conference on Data Mining SIAM,2004.
  • 5Wagstaff K, Cardie C. Clustering with instance-level constraints [ C ]//Pat L, ed. Proceeding of the 17th Int' 1 Conference on Machine Learning( ICML 2000).Stanford : Morgan Kaufmann Publishers, 2000 : 1103-1110.
  • 6Wagstaff K, Cardie C, Rogers S, et al. Constrained K-means clustering with background knowledge [ C ]//Carla E B, Andrea P D, eds. Processing of the 18th Int' 1 Conference on Machine Learning ( ICML 2001 ).Williamstown : Morgan Kanfmann Publishers, 2001 : 577-584.
  • 7Burr S. Active learning literature survey, computer sciences technical report 1648 [ R]. University of Wisconsin- Madison, 2010.
  • 8Davidson I, Wagstaff K. Measuring constraint-set utility for partitional clustering algorithms, PKDD2006, LNAI 4213, 2006,115-125.
  • 9Andreas V.A stopping criterion for active learning [ J ]. Computer, Speech and Language, 2008,22 ( 3 ) : 295 -312.
  • 10Mikhail B, Sugato B, Raymond J M. Integrating constraints and metric learning in semi-supervised clustering [ C ]// Proceeding of the 21 st International Conference on Machine Learning.Banff, Canada: ICML, 2004 : 81-88.

引证文献2

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部