期刊文献+

混合属性数据聚类的新方法 被引量:6

New clustering method of mixed-attribute data
下载PDF
导出
摘要 提出了一种数值型和类别型混合属性数据聚类的全局算法。算法通过随机选取足够多的初始原型来覆盖数据集的全局分布信息,然后通过评估函数迭代地消去多余的原型。最后对本文算法进行了验证,证明了该算法的有效性和收敛性。并与其他已有同类型算法的聚类结果进行比较,说明本文算法对混合属性数据具有更高的聚类准确度,为解决混合型数据聚类问题提供了一种新途径。 A new Global k-Prototype (GKP) algorithm is proposed for clustering mixed numeric and categorical data. First, the algorithm randomly selects a sufficiently large number of initial prototypes to account for the global distribution of the data sets. Then, it progressively eliminates the redundant prototypes using an iterative optimization process with an elimination criterion function. Systematic experiments were carried out with data from widely used datasets in this area. Experimental results and comparative evaluation show the high performance and consistency of the proposed algorithm. Compared with other well-known mixed data clustering algorithms, the proposed algorithm significantly improves the clustering accuracy.
出处 《吉林大学学报(工学版)》 EI CAS CSCD 北大核心 2013年第1期130-134,共5页 Journal of Jilin University:Engineering and Technology Edition
基金 国家自然科学基金项目(61175023 60973092 60903097) 符号计算与知识工程教育部重点实验室项目 国家留学基金委项目(2010617098)
关键词 人工智能 数据聚类 数据挖掘 K原型算法 混合属性数据 artificial intelligence clustering data mining K-prototypes algorithm mixed attribute data
  • 相关文献

参考文献16

  • 1Jain A K, Murty M N, Flynn P J. Data clustering: a review[J]. ACM Computing Surveys, 1999, 31 (3):264-323.
  • 2徐森,卢志茂,顾国昌.结合K均值和非负矩阵分解集成文本聚类算法[J].吉林大学学报(工学版),2011,41(4):1077-1082. 被引量:12
  • 3Han J, Kamber M. Data Mining Concepts and Techniques [ M]. San Francisco: Morgan Kaufmann, 2001.
  • 4MacQueen J. Some methods for classification and analysis of multivariate observation[C]//Proc 5th Berkeley Symp on Mathematical Statistics and Probability, 1967: 281-297.
  • 5Anderberg Michael R. Cluster Analysis for Applications[M]. New York: Academic Press, 1973.
  • 6Hsu C C, Huang Y P. Incremental clustering of mixed data based on distance hierarchy[J]. Expert Systems with Applications, 2008, 35 (3): 1177- 1185.
  • 7Huang Z. Clustering large data sets with mixed numeric and categorical values[C]//Proceedings of the First Pacific Asia Knowledge Discovery and Data Mining Conference, World Scientific, Singapore, 1997:21-34.
  • 8Bezdek J C, Keller J, Krisnapuram R. Fuzzy Models and Algorithms for Pattern Recognition and Image Processing[M]. Boston: Kluwer Academy Pub lishers, 1999.
  • 9Ahmad A, Dey L. Algorithm for fuzzy clustering of mixed data with numeric and categorical attributes [J]. LNCS, 2005, 3816:561-572.
  • 10Chatzis Sotirios P. A fuzzy c-means-type algorithm for clustering of data with mixed numeric and caregorical attributes employing a probabilistic dissimilarity functional[J]. Expert Systems with Applications, 2011, 38(7): 8684-8689.

二级参考文献14

  • 1罗会兰,孔繁胜,李一啸.聚类集成中的差异性度量研究[J].计算机学报,2007,30(8):1315-1324. 被引量:36
  • 2Tan P N, Steinbach M, Kumar V. Introduction to Data Mining[ M]. Boston : Addison Wesley Longman Publishing Co Inc,2005: 487-647.
  • 3Dhillon I S, Modha D S. Concept decompositions for large sparse text data using clustering[J] Machine I.earning, 2001, 42: 143-175.
  • 4Strehl A, Ghosh J. Cluster ensembles a knowledge reuse framework for combining partitionings [J]. Journal of Machine Learning Research, 2002, 3 (12) : 583-617.
  • 5Fred A I., Jain A K. Combining multiple clusterings using evidence accumulation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(6): 835-850.
  • 6Topchy A, Jain A K, Punch W. A mixture model for clustering ensembles[C]// Proceedings of the 4th SIAM International Conference on Data Mining.Florida, 2004: 379-390.
  • 7Lee D D, Seung H S. Learning the parts of objects by non-negative matrix factorization [J]. Nature, 1999, 401(6755): 788-791.
  • 8Lee D D, Seung H S. Algorithms for non-negative matrix factorization[C] // Advances in Neural Infor- mation Processing Systems,MIT Press, 2000:556 - 562.
  • 9Xu W, Liu X, Gong X Y. Document clustering based on non negative matrix factorization [C] // Proceedings of S1GIR,Toronto,2003: 267 273.
  • 10Wild S, Curry J, Dougherty A. Improving non neg ative matrix factorizations through structured initial- ization[J]. Pattern Recognition, 2004, 37 (11): 2217-2232.

共引文献11

同被引文献59

  • 1赵宇,李兵,李秀,刘文煌,任守榘.混合属性数据聚类融合算法[J].清华大学学报(自然科学版),2006,46(10):1673-1676. 被引量:9
  • 2王开军,张军英,李丹,张新娜,郭涛.自适应仿射传播聚类[J].自动化学报,2007,33(12):1242-1246. 被引量:144
  • 3Tan Pang·Ning,Michael Steinbach,Vipin Kumar.数据挖掘导论[M].范明,范宏建,译.北京:人民邮电出版社,2011.
  • 4Krishna B S Vamsi, P Satheesh, Suneel Kumar R. Comparative Study of K-means and Bisecting K-means Techniques in Wordnet Based Document Clustering [ J ]. International Journal of Engineering and Advanced Technology (IJEAT) ,2012,1 (6) :229 -234.
  • 5Liu Xiaozhang, Feng Guocan. Kernel bisecting k-means clustering for SVM training sample reduction [ C ]//International Conference on Pattern Recognition ( 1CPR 2008 ), December,2008 : 1 - 4.
  • 6Yoo IIIhoi, Hu XiaoHua. A comprehensive comparison study of document clustering for a biomedical digital library MEDLINE [ C ]// Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries ( JCDL~}6), June ,2006:220 - 229.
  • 7Silva J D A, Hruschka E R. Extending k-Means-Based Algorithms for Evolving Data Streams with Variable Number of Clusters [ C ]// International Conference on Machine Learning and Applications and Workshops ( ICMLA), December,2011 : 14 - 19.
  • 8TanPN,SteinbachM,KumarV.数据挖掘导论[M].范明,范宏建,等译.北京:人民邮电出版社,2011.
  • 9Kaufan L, Rousseeuw P J. Finding Groups in Data: An Introduc- tion to Cluster Analysis [M]. New York: John WileySons, 1990.
  • 10Huang Zhe-xu Clustering Large Data Sets with Mixed Numeric and Categorical Values[C]//Proceedings of PAKDD'97. Singa- pore,World Scientific, 1997:21-35.

引证文献6

二级引证文献38

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部