期刊文献+

一种不完备混合数据集成聚类算法 被引量:20

A Clustering Ensemble Algorithm for Incomplete Mixed Data
下载PDF
导出
摘要 集成聚类技术由于具有较好的泛化能力,目前引起了研究者的高度关注.已有研究主要关注数值型完备数据的集成聚类问题.然而,实际应用中面临的数据往往是兼具数值属性和分类属性共同描述的混合型数据,而且通常带有缺失值.为此,针对不完备混合数据提出了一种集成聚类算法,首先利用3种缺失值填充方法对不完备混合数据进行完备化处理;其次在3种填充后的不同完备数据集上分别多次执行K-Prototypes算法产生基聚类结果;最后对基聚类结果进行集成.在UCI真实数据集上与传统聚类算法通过实验进行了比较分析,实验结果表明提出的算法是有效的. Cluster ensembles have recently emerged a powerful clustering analysis technology and caught high attention of researchers due to their good generalization ability. From the existing work, these techniques held great promise, most of which generate the final results for complete data sets with numerical attributes. However, real life data sets are usually incomplete mixed data described by numerical and categorical attributes at the same time. And these existing algorithms are not very effective for an incomplete mixed data set. To overcome this deficiency, this paper proposes a new clustering ensemble algorithm which can be used to ensemble final clustering results for mixed numerical and categorical incomplete data. Firstly, the algorithm conducts completion of incomplete mixed data using three different missing value filling methods. Then, a set of clustering solutions are produced by executing K-Prototypes clustering algorithm on three different kinds of complete data sets multiple times, respectively. Next, a similarity matrix is constructed by considering all the clustering solutions. After that, the final clustering result is obtained by hierarchical clustering algorithms based on the similarity matrix. The effectiveness of the proposed algorithm is empirically demonstrated over some UCI real data sets and three benchmark evaluation measures. The experimental results show that the proposed algorithm is able to generate higher clustering quality in comparison with several traditional clustering algorithms.
出处 《计算机研究与发展》 EI CSCD 北大核心 2016年第9期1979-1989,共11页 Journal of Computer Research and Development
基金 国家自然科学基金重点项目(61432011) 国家自然科学基金项目(61573229 61502289) 山西省科技基础条件平台建设项目(2012091002-0101) 山西省自然科学基金项目(201601D202039) 山西省研究生教育创新项目(2016SY002)~~
关键词 集成聚类 不完备数据 混合数据 缺失值填充 K原型聚类算法 clustering ensemble incomplete data mixed data missing value imputation K-Prototypesclustering algorithm
  • 相关文献

参考文献19

  • 1Han Jiawei, Kamber M, Pei Jian. Data Mining Concepts and Techniques [M]. 3rd ed. San Francisco, CA Morgan Kaufmann, 2011.
  • 2孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008(1):48-61. 被引量:1072
  • 3Xo Rui, Wunsch D. Survey of clustering algorithm [J]. IEEE Trans on Neural Networks, 2005, 16(3) 645-678.
  • 4Strehl A, Ghosh J. Cluster ensembles.. A knowledge reuse {ramework {or combining multiple partitions [J]. Journal of Machine Learning Research, 2002, 3: 583-617.
  • 5Fred A L, Jaln A K. mbining multiple elusterings using evidence accumulation [J]. IEEE Trans on Pattern Analysis and Machine Intelligence, 2005, 27(6) 8315-850.
  • 6lain-On N, ongoen T. Comparative study of matrix refinement approaches for ensemble clustering [J], Machine Learning, 2015,. 98(1.[2) 69-300.
  • 7Ghosh J, Acharya A. Cluster ensembles[J]. Wiley InterdisCiplinary Reviews: Data Mining and Knowledge Discovery, 2011, 1(4): 305-315.
  • 8He Zengyou, Xu xiao][ei, Deng Shengchun. Clustering mixed numeric and categorical data: A duster ensemble approach [OL]. ArXiv es/050901t, 2005:1-14 [2015-09-08]. http:// arxiv, org/ahs/cs[050901].
  • 9Shaqsi J, Wang Wenjia. A clustering ensemble method for clustering mixed data [C] //Proe of the Int Joint Conf on Neural Networks. Piseataway, N J: IEEE, 2010 1-8.
  • 10罗会兰,危辉.一种基于聚类集成技术的混合型数据聚类算法[J].计算机科学,2010,37(11):234-238. 被引量:6

二级参考文献20

  • 1李洁,高新波,焦李成.基于特征加权的模糊聚类新算法[J].电子学报,2006,34(1):89-92. 被引量:114
  • 2[1]Marco Ramoni,Paola Sebastiani.Robust Bayes classifiers[J].Artificial Intelligence,2001,125(1-2):209-226
  • 3[2]Sameer Agarwal.Learning from incomplete data[OL].http://www.cs.ucsd.edu/user/elkan/254springol/sagarwalrep.pdf,2006
  • 4[3]Zoubin Ghahramani,Michael I Jordan.Learning from incomplete data[R].MIT Center for Biological and Computational Learning,Tech Rep:AIM-1509,1994
  • 5[4]R J A Little,D B Rubin.Statistical Analysis with Missing Data[M].Wiley Series in Probability and Mathematical Statistics.New York:Wiley and Sons,1987
  • 6[6]J W Grzymala-Busse,M Fu.A comparison of several approaches to missing attribute values in data mining[C].In:Proc of the 2nd Int'l Conf on Rough Sets and Current Trends in Computing.Berlin:Springer-Verlag,2000.378-385
  • 7[7]David Heckerman.Bayesian networks for data mining[G].In:Data Mining and Knowledge Discovery.Berlin:Springer,1997.79-119
  • 8[8]Nir Friedman,Dan Geiger,Moises Goldszmidt.Bayesian network classifiers[J].Machine Learning,1997,29(2-3):131-163
  • 9Mckusick K B,Thompson K.COBWEB/3:A portable imple-mentation. FIA-90-6-182 . 1990
  • 10Reich Y,,Fenves S.The formation and use of abstract concepts in design Concept Formation:Knowledge and Experience in Un-supervised Learning[]..1991

共引文献1147

同被引文献98

引证文献20

二级引证文献138

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部