期刊文献+

基于引力概念的聚类质量评估算法 被引量:3

Clustering Result Evaluating Algorithm in a Gravitational Way
下载PDF
导出
摘要 为了定量分析聚类算法的聚类结果,提出了基于引力概念的聚类质量评估算法.该算法将数据空间中的数据点视为带有单位质量的质点,通过分析聚类结果中数据点之间的引力关系来评估聚类结果的质量.在一个聚类结果中,各类中的数据点之间引力大并且噪音数据受到的引力小,这样的聚类结果视为质量较高的聚类结果.相反,如果类中数据间的引力较小而噪音数据所受到的引力较大,这样的聚类结果就是一个质量不高的聚类结果.在几个不同的数据集上,对算法的有效性和高效性进行了测试.实验结果表明,该算法能在极短的响应时间内得到聚类结果评估值,正确地反映聚类结果的优劣.提出的算法可以引导聚类方法自动发现最佳聚类结果而无需人工干预. A clustering result evaluating algorithm is presented in a gravitational way, where all the data points in the data space are regarded as the particles assigned with unit mass. The quality of such a clustering result is evaluated through analyzing the gravitational relation between different data points in the clustering result in which the greater the gravitation between data points, the smaller the gravitation acted on noise data points--this is regarded as a quality result and vice versa. Experiments conducted on several datasets verify the validity and high efficiency of the proposed algorithm which can get an evaluation value to reflect whether the clustering result is of good or poor quality. Furthermore, the proposed algorithm can lead the clustering algorithm to find the best result automatically without any manual interference.
出处 《东北大学学报(自然科学版)》 EI CAS CSCD 北大核心 2007年第8期1109-1112,共4页 Journal of Northeastern University(Natural Science)
基金 国家自然科学基金资助项目(6027307960473074)
关键词 聚类 聚类质量评估 引力 聚类算法 数据挖掘 clustering clustering result evaluation gravitation clustering algorithm data mining
  • 相关文献

参考文献9

  • 1Breunig M M,Kriegel H,Kroger P,et al.Data bubbles:quality preserving performance boosting for hierarchical clustering[C]∥ACM SIGMOD.Santa Barbara:ACM Press,2001:21-24.
  • 2Nassar S,Sander J,Cheng C.Incremental and effective data summarization for dynamic hierarchical clustering[C]∥ACM SIGMOD.Paris:ACM Press,2004:13-18.
  • 3Ankerst M,Breunig M M,Kriegel H,et al.OPTICS:ordering points to identify the clustering structure[C]∥ACM Special Interest Group on Management of Data.Philadelphia:ACM Press,1999:49-60.
  • 4Sander J,Ester M,Kriegel H,et al.Density-based clustering in spatial databases:the algorithm GDBSCAN and its applications[J].Data Mining and Knowledge Discovery,1998,2(2):169-194.
  • 5Halkidi M,Batistakis Y,Vazirgiannis M.Clustering algorithms and validity measures[C]∥The 13th International Conference on Scientific and Statistical Database Management.New York:IEEE Press,2001:3-22.
  • 6Yeung K Y,Haynor D R,Ruzzo W L.Validating clustering for gene expression data[J].Bioinformatics,2001,17(4):309-318.
  • 7Tibshirani R,Walther G,Bostein D,et al.Clustering validation by prediction strength[J].Journal of Computational & Graphical Statistics,2005,14(3):511-528.
  • 8Shi Y,Song Y Q,Zhang A D.A shrinking-based approach for multi-dimensional data analysis[C]∥Proceedings of the 29th VLDB Conference.New York:ACM Press,2003:124-136.
  • 9于勇前,赵相国,王国仁,陈衡岳.一种基于密度单元的自扩展聚类算法[J].控制与决策,2006,21(9):974-978. 被引量:7

二级参考文献9

  • 1Macqueen J.K-means:Some Methods for Classification and Analysis of Multivariate Observations[A].The 5th Berkeley Symp on Mathematical Statistics and Probability[C].Berkeley,1976:56-68.
  • 2Markus M,Breunig,Hans-Peter Kriegel,et al.Data Bubbles:Quality Preserving Performance Boosting for Hierarchical Clustering[A].ACM SIGMOD[C].Santa Barbara,2001:99-112.
  • 3Samer Nassar,Jorg Sander,Corrine Cheng.Incremental and Effective Data Summarization for Dynamic Hierarchical Clustering[A].ACM SIGMOD[C].Paris,2004:13-18.
  • 4Guha S,Rastogi R,Shim K.CURE:An Efficient Clustering Algorithm for Large Databases[A].ACM Special Interest Group on Management of Data[C].Washington,1998:73-84.
  • 5Zhang T,Ramakrishnan R,Livny M.BIRCH:An Efficient Data Clustering Method for Very Large Databases[A].ACM SIGMOD Int Conf on Management of Data[C].Montreal,1996:103-114.
  • 6Ankerst M,Breunig M,Kriegel H,et al.OPTICS:Ordering Points to Identify the Clustering Structure[A].ACM Special Interest Group on Management of Data[C].Philadelphia,1999:49-60.
  • 7Sander J.Density-based Clustering in Spatial Databases:The Algorithm GDBSCAN and It Applications[J].Data Mining and Konwledge Discovery,1998,2(2):169-194.
  • 8Ester M,Kriegel H,Sander J.A Density-based Lgorithm for Discovering Clusters in Large Spatial Databases with Noise[A].Knowledge Discovery and Data Mining[C].Portland,1996:226-231.
  • 9王明善,沈恒慈.概率论与数理统计[M].北京:高等教育出版社,1999.

共引文献6

同被引文献21

  • 1周水庚,周傲英,金文,范晔,钱卫宁.FDBSCAN:一种快速 DBSCAN算法(英文)[J].软件学报,2000,11(6):735-744. 被引量:42
  • 2岳士弘,李平,于剑.一组新的聚类有效性指标[J].模式识别与人工智能,2004,17(4):516-522. 被引量:5
  • 3淦文燕,李德毅,王建民.一种基于数据场的层次聚类方法[J].电子学报,2006,34(2):258-262. 被引量:83
  • 4李双虎,张风海.一个新的聚类有效性分析指标[J].计算机工程与设计,2007,28(8):1772-1774. 被引量:14
  • 5Miller H, Han J. Geographic Data Mining and Knowledge Discovery [M]. 2nd ed. Boca Raton: CRC Press, 2009.
  • 6Berry M, Linoff G. Data Mining Techniques for Marketing, Sales and Customer Support[M]. New York: John Wiley & Sons Inc, 1996.
  • 7Fowlkes E, Mallows C. A Method for Comparing Two Hierarchical Clusterings [J].Journal of the American Statistical Association, 1983, 382 (78) : 569-576.
  • 8HalkidiI M, Batistakis Y, Vazirgiannis M. On Clustering Validation Techniques[J]. Intelligent Information Systems, 2001, 223(17) : 107-145.
  • 9Pal N G, Biswas J. Cluster Validation Using Graph Theoretic Concepts[J]. Pattern Recognition, 1997, 30(6) :847-857.
  • 10Kovacs F, Legany C, Babos A. Cluster Validity Measurement Techniques[C]. The 5th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases, Madrid, Spain, 2006.

引证文献3

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部