摘要
提出了一种基于集成技术和谱聚类技术的混合数据聚类算法CBEST。它利用聚类集成技术产生混合数据间的相似性,这种相似性度量没有对数据特征值分布模型做任何的假设。基于此相似性度量得到的待聚类数据的相似性矩阵,应用谱聚类算法得到混合数据聚类结果。大量真实和人工数据上的实验结果验证了CBEST的有效性和它对噪声的鲁棒性。与其它混合数据聚类算法的比较研究也证明了CBEST的优越性能。CBEST还能有效融合先验知识,通过参数的调节来设置不同属性在聚类中的权重。
A clustering algorithm based on ensemble and spectral technique named CBEST that works well for data with mixed numeric and categorical features was presented.A similarity measure based on clustering ensemble was adopted to define the similarity between pairs of objects,which makes no assumptions of the underlying distributions of the feature values.A spectral clustering algorithm was employed on the similarity matrix to extract a partition of the data.The performance of CBEST was studied on artificial and real data sets.Results demonstrate the effectiveness of this algorithm in clustering mixed data tasks and its robustness to noise.Comparisons with other related clustering schemes illustrate the superior performance of this approach.Moreover,CBEST can infuse prior knowledge effectively to set the weights of different features in clustering.
出处
《计算机科学》
CSCD
北大核心
2010年第11期234-238,274,共6页
Computer Science
基金
国家973项目(No2010CB327900)
国家自然科学基金(No60303007)
上海科技发展基金(No08511501703)
上海市智能信息处理重点实验室开放课题(NoIIPL-09-009)资助
关键词
聚类集成
混合型数据
相似性度量
Clustering ensemble
Mixed data
Similarity measure