混合属性数据聚类的新方法被引量：6

New clustering method of mixed-attribute data

下载PDF

导出

摘要提出了一种数值型和类别型混合属性数据聚类的全局算法。算法通过随机选取足够多的初始原型来覆盖数据集的全局分布信息,然后通过评估函数迭代地消去多余的原型。最后对本文算法进行了验证,证明了该算法的有效性和收敛性。并与其他已有同类型算法的聚类结果进行比较,说明本文算法对混合属性数据具有更高的聚类准确度,为解决混合型数据聚类问题提供了一种新途径。 A new Global k-Prototype （GKP） algorithm is proposed for clustering mixed numeric and categorical data. First, the algorithm randomly selects a sufficiently large number of initial prototypes to account for the global distribution of the data sets. Then, it progressively eliminates the redundant prototypes using an iterative optimization process with an elimination criterion function. Systematic experiments were carried out with data from widely used datasets in this area. Experimental results and comparative evaluation show the high performance and consistency of the proposed algorithm. Compared with other well-known mixed data clustering algorithms, the proposed algorithm significantly improves the clustering accuracy.

作者白天冀进朝何加亮周春光

机构地区吉林大学计算机科学与技术学院新泽西州立大学计算机系

出处《吉林大学学报（工学版）》 EI CAS CSCD 北大核心 2013年第1期130-134,共5页 Journal of Jilin University:Engineering and Technology Edition

基金国家自然科学基金项目(61175023 60973092 60903097) 符号计算与知识工程教育部重点实验室项目国家留学基金委项目(2010617098)

关键词人工智能数据聚类数据挖掘 K原型算法混合属性数据 artificial intelligence clustering data mining K-prototypes algorithm mixed attribute data

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献16

1Jain A K, Murty M N, Flynn P J. Data clustering: a review[J]. ACM Computing Surveys, 1999, 31 (3):264-323.
2徐森,卢志茂,顾国昌.结合K均值和非负矩阵分解集成文本聚类算法[J].吉林大学学报（工学版）,2011,41(4):1077-1082. 被引量：12
3Han J, Kamber M. Data Mining Concepts and Techniques [ M]. San Francisco: Morgan Kaufmann, 2001.
4MacQueen J. Some methods for classification and analysis of multivariate observation[C]//Proc 5th Berkeley Symp on Mathematical Statistics and Probability, 1967: 281-297.
5Anderberg Michael R. Cluster Analysis for Applications[M]. New York: Academic Press, 1973.
6Hsu C C, Huang Y P. Incremental clustering of mixed data based on distance hierarchy[J]. Expert Systems with Applications, 2008, 35 (3): 1177- 1185.
7Huang Z. Clustering large data sets with mixed numeric and categorical values[C]//Proceedings of the First Pacific Asia Knowledge Discovery and Data Mining Conference, World Scientific, Singapore, 1997:21-34.
8Bezdek J C, Keller J, Krisnapuram R. Fuzzy Models and Algorithms for Pattern Recognition and Image Processing[M]. Boston: Kluwer Academy Pub lishers, 1999.
9Ahmad A, Dey L. Algorithm for fuzzy clustering of mixed data with numeric and categorical attributes [J]. LNCS, 2005, 3816:561-572.
10Chatzis Sotirios P. A fuzzy c-means-type algorithm for clustering of data with mixed numeric and caregorical attributes employing a probabilistic dissimilarity functional[J]. Expert Systems with Applications, 2011, 38(7): 8684-8689.

二级参考文献14

1罗会兰,孔繁胜,李一啸.聚类集成中的差异性度量研究[J].计算机学报,2007,30(8):1315-1324. 被引量：36
2Tan P N, Steinbach M, Kumar V. Introduction to Data Mining[ M]. Boston : Addison Wesley Longman Publishing Co Inc,2005: 487-647.
3Dhillon I S, Modha D S. Concept decompositions for large sparse text data using clustering[J] Machine I.earning, 2001, 42: 143-175.
4Strehl A, Ghosh J. Cluster ensembles a knowledge reuse framework for combining partitionings [J]. Journal of Machine Learning Research, 2002, 3 (12) : 583-617.
5Fred A I., Jain A K. Combining multiple clusterings using evidence accumulation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(6): 835-850.
6Topchy A, Jain A K, Punch W. A mixture model for clustering ensembles[C]// Proceedings of the 4th SIAM International Conference on Data Mining.Florida, 2004: 379-390.
7Lee D D, Seung H S. Learning the parts of objects by non-negative matrix factorization [J]. Nature, 1999, 401(6755): 788-791.
8Lee D D, Seung H S. Algorithms for non-negative matrix factorization[C] // Advances in Neural Infor- mation Processing Systems,MIT Press, 2000:556 - 562.
9Xu W, Liu X, Gong X Y. Document clustering based on non negative matrix factorization [C] // Proceedings of S1GIR,Toronto,2003: 267 273.
10Wild S, Curry J, Dougherty A. Improving non neg ative matrix factorizations through structured initial- ization[J]. Pattern Recognition, 2004, 37 (11): 2217-2232.

共引文献11

1李顺勇,易利红,陈勇,张晓琴.非负矩阵分解在能力验证中的应用[J].山西大学学报（自然科学版）,2013,36(1):25-29.
2乔艳花,张晓琴.处理核磁共振数据的一种改进非负矩阵分解算法[J].太原师范学院学报（自然科学版）,2012,11(4):72-74.
3吴兴扬,卫志农,孙国强,罗剑波.基于非负矩阵分解的同调机群识别方法[J].电力系统自动化,2013,37(14):59-64. 被引量：12
4余先川,任雅丽.非负矩阵分解及在地学中的应用[J].地质学刊,2014,38(2):238-244. 被引量：1
5冯进玫,卢志茂,杨朋,张子红.基于局部占优度的彩色图像分割算法[J].华中科技大学学报（自然科学版）,2014,42(9):44-48. 被引量：2
6雷敏,仲思东,屠礼芬.一种三维点云聚类算法的研究[J].科学技术与工程,2014,22(29):50-53. 被引量：11
7刘云,康冰,侯涛,王柯,刘富.优化初值的C均值算法[J].吉林大学学报（工学版）,2018,48(1):306-311. 被引量：1
8赵艳萍,徐胜超.基于云计算与非负矩阵分解的数据分级聚类[J].现代电子技术,2018,41(5):56-60. 被引量：9
9陈剑军,刘智秉.基于SVD确定NMF初始化矩阵维数[J].电脑知识与技术,2015,11(8X):116-118.
10杨勇,陈强,曲福恒,刘俊杰,张磊.基于模拟划分的SP-k-means-+算法[J].吉林大学学报（工学版）,2021,51(5):1808-1816. 被引量：3

同被引文献59

1赵宇,李兵,李秀,刘文煌,任守榘.混合属性数据聚类融合算法[J].清华大学学报（自然科学版）,2006,46(10):1673-1676. 被引量：9
2王开军,张军英,李丹,张新娜,郭涛.自适应仿射传播聚类[J].自动化学报,2007,33(12):1242-1246. 被引量：144
3Tan Pang·Ning,Michael Steinbach,Vipin Kumar.数据挖掘导论[M].范明,范宏建,译.北京:人民邮电出版社,2011.
4Krishna B S Vamsi, P Satheesh, Suneel Kumar R. Comparative Study of K-means and Bisecting K-means Techniques in Wordnet Based Document Clustering [ J ]. International Journal of Engineering and Advanced Technology (IJEAT) ,2012,1 (6) :229 -234.
5Liu Xiaozhang, Feng Guocan. Kernel bisecting k-means clustering for SVM training sample reduction [ C ]//International Conference on Pattern Recognition ( 1CPR 2008 ), December,2008 : 1 - 4.
6Yoo IIIhoi, Hu XiaoHua. A comprehensive comparison study of document clustering for a biomedical digital library MEDLINE [ C ]// Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries ( JCDL~}6), June ,2006:220 - 229.
7Silva J D A, Hruschka E R. Extending k-Means-Based Algorithms for Evolving Data Streams with Variable Number of Clusters [ C ]// International Conference on Machine Learning and Applications and Workshops ( ICMLA), December,2011 : 14 - 19.
8TanPN,SteinbachM,KumarV.数据挖掘导论[M].范明,范宏建,等译.北京:人民邮电出版社,2011.
9Kaufan L, Rousseeuw P J. Finding Groups in Data: An Introduc- tion to Cluster Analysis [M]. New York: John WileySons, 1990.
10Huang Zhe-xu Clustering Large Data Sets with Mixed Numeric and Categorical Values[C]//Proceedings of PAKDD'97. Singa- pore,World Scientific, 1997:21-35.

引证文献6

1高芹,陈亚.数据挖掘中一种高效的聚类通用框架研究[J].科学技术与工程,2014,22(16):112-118. 被引量：2
2刘广聪,黄婷婷,陈海南.改进的二分K均值聚类算法[J].计算机应用与软件,2015,32(2):261-263. 被引量：25
3黄德才,钱潮恺.基于维度属性距离的混合属性近邻传播聚类算法[J].计算机科学,2015,42(B11):55-57. 被引量：1
4钱潮恺,黄德才.基于维度频率相异度和强连通融合的混合数据聚类算法[J].模式识别与人工智能,2016,29(1):82-89. 被引量：5
5王华勇,韩松,肖孝天,杨超.改进的电力负荷曲线集成K-medoids聚类算法[J].电力科学与工程,2019,35(2):38-43. 被引量：3
6李烨桐,郭洁,祁霖,刘璇,阮鹏宇,陶新民.密度敏感模糊核最大熵聚类算法[J].控制理论与应用,2022,39(1):67-82. 被引量：2

二级引证文献38

1赵露.基于聚类分析的网络安全数据特征可视化融合研究[J].长春工程学院学报（自然科学版）,2020(2):94-97. 被引量：3
2穆建晔,田碧洁.影响经济发展的文化因素及其扬弃[J].学术交流,2000(3):36-38.
3亓慧.多中心的非平衡K-均值聚类方法[J].中北大学学报（自然科学版）,2015,36(4):453-457. 被引量：2
4王东强,王晓霞.云存储中大数据优化粒子群聚类算法[J].电子设计工程,2017,25(2):26-30. 被引量：13
5梅毅,熊婷,罗少彬.复杂属性环境下NoSQL分布式大数据挖掘方法研究[J].科学技术与工程,2017,17(9):239-243. 被引量：15
6肖锋,冯飞,田鹏辉.多尺度下特征点的目标匹配[J].西安工业大学学报,2017,37(1):22-27. 被引量：1
7邓有林.大型Web网络数据中心资源高效挖掘技术研究[J].现代电子技术,2018,41(3):120-123. 被引量：5
8杨旭华,朱钦鹏,童长飞.基于Laplacian中心性的密度聚类算法[J].计算机科学,2018,45(1):292-296. 被引量：2
9贾晓婷,王名扬,曹宇.结合Doc2Vec与改进聚类算法的中文单文档自动摘要方法研究[J].数据分析与知识发现,2018,2(2):86-95. 被引量：18
10渠慎明,王青博,刘珊,张东生.基于二分K均值聚类和最近特征线的视频超分辨率重建方法[J].河南大学学报（自然科学版）,2018,48(3):292-298. 被引量：2

1郝红英,范礼.工业控制中的仿人智能控制[J].自动化应用,2015(11):36-37. 被引量：1
2赵立江,黄永青,刘玉龙.改进的混合属性数据聚类算法[J].计算机工程与设计,2007,28(20):4850-4852. 被引量：7
3王培进,马文明.仿人智能控制原型算法的改进[J].烟台大学学报（自然科学与工程版）,2008,21(1):61-65. 被引量：2
4杨丽,万敏.基于MATLAB的一种改进仿人智能控制器的仿真研究[J].自动化与仪器仪表,2015(3):173-175. 被引量：5
5王晓辉,陈昌爱.基于混合属性处理的无监督异常检测[J].福建电脑,2012,28(5):95-97. 被引量：2
6陈丹,王振华.一种改进的混合属性数据聚类算法[J].电脑知识与技术（过刊）,2010(13):2713-2716.
7陆建江,徐宝文.挖掘典型的语言值关联规则[J].东南大学学报（自然科学版）,2004,34(3):318-321. 被引量：3
8罗忠,柳洪义,王菲,姜杨.基于规则的仿人智能控制方法及其应用[J].控制工程,2009,16(1):83-87. 被引量：5
9管忆军,王勇,何德牛.一种采用函数迭代运算的数据流挖掘方法[J].广西民族大学学报（自然科学版）,2012,18(1):45-49. 被引量：2
10柳波,聂幼三,燕继坤.SOM在网络数据分析中的应用研究[J].电信技术研究,2007(4):7-12.

吉林大学学报（工学版）

2013年第1期

浏览历史

内容加载中请稍等...

混合属性数据聚类的新方法被引量：6

参考文献16

二级参考文献14

共引文献11

同被引文献59

引证文献6

二级引证文献38

相关作者

相关机构

相关主题

浏览历史

混合属性数据聚类的新方法 被引量：6

参考文献16

二级参考文献14

共引文献11

同被引文献59

引证文献6

二级引证文献38

相关作者

相关机构

相关主题

浏览历史

混合属性数据聚类的新方法被引量：6