期刊文献+

可处理混合属性的任意形状聚类 被引量:2

Arbitrary shape clustering for mixed attributes dataset
下载PDF
导出
摘要 聚类是数据挖掘中一个非常活跃的研究分支,任意形状的聚类则是一个有待研究的开放问题。提出一种包含分类属性取值频率信息的类间差异性度量和一种对象与类的相似度定义,在此基础上提出一种能处理任意形状的聚类算法,可处理混合属性数据集。在人造数据集和真实数据集上检验了提出的算法,并与相关算法进行了对比,实验结果表明,提出的算法是有效可行的。 Clustering is a very active research branch in data mining field.The research about the arbitrary shape clustering is an open problem.In this paper an inter-cluster dissimilarity measure taking into account the frequency information of the categorical attribute values is introduced.An arbitrary shape clustering algorithm is proposed by defining the similarity degree between an object and a cluster.It can be used for the mixed attributes dataset.The experimental results on the synthetic and real-life datasets show that the proposed algorithm is feasible and effective comparing to other classical algorithms.
出处 《计算机工程与应用》 CSCD 北大核心 2010年第34期136-139,共4页 Computer Engineering and Applications
基金 国家高技术研究发展计划(863)(No.2006AA01A120) 河南省教育厅自然科学基础研究计划(No.2010A520033) 郑州轻工业学院博士科研基金资助项目~~
关键词 任意形状聚类 混合属性 相似度 arbitrary shape clustering mixed attributes similarity degree
  • 相关文献

参考文献9

  • 1HanJ,KamberM.数据挖掘:概念与技术[M].范明,孟小峰,译.2版.北京:机械工业出版社,2006.
  • 2Ding C,He X.K-nearest-neighbor in data clustering:Incorporating local information into global optimization[C]//Proc of the ACM Symp on Applied Computing.Nicosia: ACM Press, 2004: 584-589.
  • 3Gelbard R,Goldman O,Spiegler I.Investigating diversity of clustering methods: An empirical comparison[J].Data & Knowledge Engineering, 2007,63 ( 1 ) : 155-166.
  • 4Karypis G, Han E, Kumar V.CHAMELEON:A hierarchical clustering algorithm using dynamic modeling[J].Computer, 1999, 32 (8) :68-75.
  • 5Birant D, Kut A.ST-DBSCAN:An algorithm for clustering spatial-temporal data[J].Data & Knowledge Engineering, 2007, 60 (1):208-221.
  • 6Jiang S, Song X.A clustering-based method for unsupervised intrusion detections[J].Pattern Recognition Letters,2006,27(5): 802-810.
  • 7Huang Z.Extensions to the k-means algorithm for clustering large data sets with categorical values[J].Data Mining and Knowledge Discovery, 1998,2(3) :283-304.
  • 8蒋盛益,李庆华.一种增强的k-means聚类算法[J].计算机工程与科学,2006,28(11):56-59. 被引量:15
  • 9Asuncion A, Newman D.UCI machine learning repository[EB/OL]. ( 2007 ) .http ://www.ics.uci.edu/-mleam/MLRepository.

二级参考文献9

  • 1J MacQueen.Some Methods for Classification and Analysis of Multivariate Observations[A].Proc 5th Berkeley Symp Mathematics Statist and Probaility[C].1967.281-297.
  • 2H Ralambondrainy.A Conceptual Version of the k-Means Algorithm[J].Pattern Recognition Letters,1995,16(11):1147-1157.
  • 3Zhexue Huang.A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining[A].Proc SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery[C].1997.
  • 4Zhexue Huang.Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values[J].Data Mining and Knowledge Discovery,1998,2(3):283-304.
  • 5C J Merz,P Merphy.UCI Repository of Machine Learning Databases[EB/OL].http://www.ics.uci.edu/ mlearn/ MLRRepository.html,2004-09.
  • 6MIT Lincoln Labs.1999 DARPA Intrusion Detection Evaluation[EB/OL].http://www.ll.mit.edu/IST/ideval/index.html,1999-12.
  • 7G W Milligan,M C Cooper.An Examination of Procedures for Determining the Number of Clusters in a Data Set[J].Psychometrika,1987,50(2):159 -179.
  • 8M Meila,D Heckerman.An Experimental Comparison of Several Clustering and Initialization Methods[A].Proc of the 14th Conf on Uncertainty in Artificial Intelligence[C].1998.386-395.
  • 9C Fraley,A E Raftery.How Many Clusters? Which Clustering Method? Answers via Model-Based Cluster Analysis[J].Computer Journal,1998,41(8):578-588.

共引文献14

同被引文献20

  • 1蒋盛益,李庆华,赵延喜.一种两阶段异常检测方法[J].小型微型计算机系统,2005,26(7):1237-1240. 被引量:7
  • 2何登发,李德生.沉积盆地动力学研究的新进展[J].地学前缘,1995,2(3):53-58. 被引量:33
  • 3蒋盛益.基于投票机制的融合聚类算法[J].小型微型计算机系统,2007,28(2):306-309. 被引量:7
  • 4蒋盛益,姜灵敏.一种高效异常检测方法[J].计算机工程,2007,33(7):166-168. 被引量:7
  • 5Patcha A, Park J M. An overview of anomaly detection techniques: Existing solutions and latest technological trends[J]. Comp Networks ,2007,51 (12) :3448.
  • 6Jiang M F, Tseng S S, Su C M. Two-phase clustering process for outliers detection[ J]. Computational Statistics and Data Analysis,2001,36 (3) :351.
  • 7Portnoy L, Eskin E, Stolfo S. Intrusion detection with unla- beled data using clustering[ C ]//Proc of the ACM Work- shop on Data Mining Applied to Security, Philadelphia: PA,2001:5 - 8.
  • 8He Z, Xu X, Deng S. Discovering cluster-based local outli- ers [ J ]. Pattern Recognition Letters, 2003, 24 ( 9 - 10) :1651.
  • 9Fred A L. Finding consistent clusters in data partitions [ C ]//Procs of the Second Int Workshop on Multiple Classifier Syst Lecture Notes in Comp Sci, London: Snrineer-Verlag.2001.309 - 318.
  • 10Strehl A, Ghosh J. Cluster ensembles--A knowledge reuse framework for combining multiple partitions [ J ]. J of Ma' chine Learning Research ,2003,3 (3) :583.

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部