期刊文献+

基于相似性度量的高维聚类算法的研究 被引量:4

Research on High Dimensional Clustering Algorithm Based on Similarity Measure
下载PDF
导出
摘要 针对高维数据相似度难以定义的问题,本文提出了一种新的高维数据聚类算法。该算法基于一个能够更准确表达高维数据对象之间相似性的度量函数,首先计算对象两两之间的相似度并得出一个相似度矩阵,然后根据该相似度矩阵自底向上对数据进行聚类分析。实验显示,该算法能够获得质量更高的聚类结果,并且不受孤立点影响,对数据输入顺序也不敏感。 Facing the difficult problem of how to define similarity measure for high dimensional data, a new high dimensional clustering algorithm is designed in this paper. This new clustering algorithm is based on a new similarity measure function, which can more accurately to express the similarity degree among the high dimensional data. The executing process of the algorithm is followed: firstly it uses the similarity measure function to compute the similarity degree for each high dimensional data to obtain the similarity matrix, and then conducts the cluster analysis based on the similarity matrix by the Bottom-up method. The experiment shows that this algorithm can improve the clustering analysis accurately and effectively, and will not be influent by the outliers. This algorithm is also insensitive to the input order of the data.
出处 《微计算机信息》 2009年第27期187-188,198,共3页 Control & Automation
关键词 高维数据 聚类分析 相似性度量 high dimension data cluster analysis similarity measure
  • 相关文献

参考文献6

  • 1Jiawei Han,Micheline Kamber(著),范明,孟小峰(译).数据挖掘概念与技术[M].北京:机械工业出版社,2007.3.2.
  • 2贺玲,吴玲达,蔡益朝.高维空间中数据的相似性度量[J].数学的实践与认识,2006,36(9):189-194. 被引量:20
  • 3Agrawal R, Gehrke J. Gunopolos D, et al. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In ACM SIGMOD Conference, 1998.
  • 4Sudipto Guha,Rajeev Rastogi,Kyuseok Shim. CURE: An Efficient Clustering Algorithm for Large Databases [A].Proceedings of the ACM SIGMOD international conference on Management of data [C].New York:ACM Press,1998.73-84.
  • 5Galliat, Tobias. Adaptive Multilevel Cluster Analysis by Self- Organizing Box Maps [EB/OL],2002, http://www.diss.fu-berlin.de/ diss/receive/FUDISS_thesis_000000000679.
  • 6陈良维.数据挖掘中聚类算法研究[J].微计算机信息,2006(07X):209-211. 被引量:32

二级参考文献15

  • 1汪祖媛,庄镇泉,王煦法.逐维聚类的相似度索引算法[J].计算机研究与发展,2004,41(6):1003-1009. 被引量:5
  • 2荆丰伟,刘冀伟,王淑盛.改进的K-均值算法在岩相识别中的应用[J].微计算机信息,2004,20(7):41-42. 被引量:5
  • 3韩家炜 Michelin K.数据挖掘:概念与技术[M].北京:机械工业出版社,2001..
  • 4A. K. J ain , R. C. Dubes. Algorithm for Clustering Data[C].Prentice Hall, 19881
  • 5Kanungo T, Mount DM, Netanyahu NS. An efficient k-menas clustering algorithm: analysis and implementation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002,24(7):881-892.
  • 6Kohonen T. The Self-Organizing Maps[J]. Proceedings of the IEEE, 1990,78(9):1464-1480.
  • 7Kohonen T. Self organization of a massive document collection[EB/OL].http://lib.hut.fgDiss/2000/isbn95122.52600/articl -e7.pdf,2000.
  • 8Yannis Sismanis. Nick Roussopoulos. The dwarf data cube eliminates the high dimensionality eurse[R]. TR-CS4552. University of Maryland, 2003.
  • 9Pitor Indyk. Rajeev Motvani. Approximate nearest neighbo::s: Toward removing the curse of dimensionality[C].In ACM Symposium on Theory of Computing. 1998.
  • 10Bellmann R. Adaptive Control Processes: A Guided Tour[M]. Princeton University Press. 1961.

共引文献113

同被引文献54

引证文献4

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部