期刊文献+

聚类分析中的差异性度量方法研究 被引量:4

Research on Dissimilarity for Clustering Analysis
下载PDF
导出
摘要 距离与差异性度量是聚类分析中的基本概念,是许多聚类算法的核心内容。在经典的聚类分析中,度量差异性的指标是距离的简单函数。该文针对混合属性数据集,提出两种距离定义,将差异性度量推广成为距离、类大小等因素的多元函数,使得原来只适用于数值属性或分类属性数据的聚类算法可用于混合属性数据。实验结果表明新的距离定义和差异性度量方法可提高聚类的质量。 The distance and dissimilarity are basic concepts in clustering analysis.In classical clustering analysis,the dissimilarity is only simple function of distance.This paper propose s two distance definitions for attribute-mixed dataset,and generalizes dissimilarity to multi-function of distance and cluster size,the new distance and dissimilarity definitions make existed clustering algorithms for numerical attribute or categorical attribute can be used to attribute-mixed dataset.The experimental results show that the new distance and dissimilarity definitions can improve clustering quality.
出处 《计算机工程与应用》 CSCD 北大核心 2005年第11期146-149,共4页 Computer Engineering and Applications
基金 国家自然科学基金项目(编号:60273075)
关键词 距离 差异性 聚类 distance,dissimilarity,clustering
  • 相关文献

参考文献10

  • 1Guha S,Rastogi R,Shim K.ROCK:A robust clustering algorithm for categorical attributes[C].In :proceedings of the 15th ICDE ,Sydney,Australia, 1999:512~521.
  • 2何增有,徐晓飞,邓胜春.Squeezer:An Efficient Algorithm for Clustering Categorical Data[J].Journal of Computer Science & Technology,2002,17(5):611-624. 被引量:32
  • 3Guha S,Meyerson A,Mishra N et al. Clustering data streams:Theory and practice[J].Knowledge and Data Engineering,IEEE Transactions on,2003;15(3) :515~528.
  • 4Portnoy L,Eskin L,Stolfo S.Intrusion Detection with Unlabeled Data using Clustering[C].In:Proceedings of ACM CSS Workshop on Data Mining Applied to Security(DMSA-2001),Philadelphia,PA,2001.
  • 5Eskin E,Arnold A,Prerau M et al.A geometric framework for unsupervised anomaly detection:Detecting intrusions in unlabeled data. In Data Mining for Security Applications,2002.
  • 6Zhexue Huang. Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values[J].Data Mining and Knowledge Discovery, 1998 ;2:283~304.
  • 7Sheng-yi Jiang,Yu-ming Xu.An Efficient Clustering Algorithm[C].In:Proc of 2004 International Conference on Machine Learning and Cybernetics.
  • 8Sheng-yi Jiang,Qing-hua Li.A Novel Intrusion Detection Method[C].In:Proc of 2004 IFIP International Conference on Network and Parallel Computing.
  • 9Sheng-yi Jiang,Qing-hua Li.A Gravity-based Intrusion Detection Method[C].In:Proc of 2004 The Third International Conference on Grid and Cooperative Computing.
  • 10Merz C J,Merphy P. UCI repository of machine learning databases.URL: http://www.ics.uci.edu/mlearn/MLRRepository.html.

二级参考文献17

  • 1Sudipto Guha, Rajeev Rastogi, Kyuseok Shim. ROCK: A robust clustering algorithm for categorical attributes. In Proc. 1999 Int. Conf. Data Engineering, Sydney, Australia, Mar., 1999, pp.512-521.
  • 2Alexandros Nanopoulos, Yannis Theodoridis, Yannis Manolopoulos. C2P: Clustering based on closest pairs. In Proc. 27th Int. Conf. Very Large Database, Rome, Italy, September, 2001, pp.331-340.
  • 3Ester M, Kriegel H P, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases.In Proc. 1996 Int. Conf. Knowledge Discovery and Data Mining (KDD'96), Portland, Oregon, USA, Aug., 1996,pp.226-231.
  • 4Zhang T, Ramakrishnan R, Livny M. BIRTH: An efficient data clustering method for very large databases. In Proc.the ACM-SIGMOD Int. Conf. Management of Data, Montreal, Quebec, Canada, June, 1996, pp.103-114.
  • 5Sudipto Guha, Rajeev Rastogi, Kyuseok Shim. CURE: A clustering algorithm for large databases. In Proc. the ACM SIGMOD Int. Conf. Management of Data, Seattle, Washington, USA, June, 1998, pp.73-84.
  • 6Karypis G, Han E-H, Kumar V. CHAMELEON: A hierarchical clustering algorithm using dynamic modeling. IEEE Computer, 1999, 32(8): 68-75.
  • 7Sheikholeslami G, chatterjee S, Zhang A. WaveCluster: A multi-resolution clustering approach for very large spatial databases. In Proc. 1998 Int. Conf. Very Large Databases, New York, August, 1998, pp.428-439.
  • 8Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. In Proc. the 1998 ACM SIGMOD Int. Conf. Management of Data, Seattle, Washington,USA, June, 1998, pp.94-105.
  • 9Jiang M FI Tseng S S, Su C M. Two-phase clustering process for outliers detection. Pattern Recognition Letters,2001, 22(6/7): 691-700.
  • 10Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan. CACTUS-clustering categorical data using summaries.In Proc. 1999 Int. Conf. Knowledge Discovery and Data Mining, August, 1999, pp.73-83.

共引文献31

同被引文献50

引证文献4

二级引证文献31

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部