期刊文献+

拓展集合差异度高维数据聚类

Clustering for high dimensional data based on extended set dissimilarity
下载PDF
导出
摘要 提出度量多个集合之间总体差异程度的拓展集合差异度及相关定理,并给出一种新的解决分类属性高维数据聚类问题的CAESD算法。基于拓展集合差异度及拓展集合特征向量,在CABOSFV_C聚类的基础上通过两阶段聚类完成全部聚类过程。采用UCI数据集与K-modes及其改进算法、CABOSFV_C算法进行比较实验,结果表明CAESD算法具有较高的聚类正确率。 This paper proposed extended set dissimilarity and related theory to measure the general dissimilarity among sets,and proposed a new algorithm to cluster high dimensional data named as clustering algorithm based on extended set dissimilarity for categorical attributes(CAESD),which executed two steps clustering process using extended set dissimilarity and extended set feature vector on the basis of CABOSFV_C algorithm.Comparative tests using UCI data sets show that CAESD algorithm has higher clustering accuracy than K-modes algorithm,improved approaches of K-modes and CABOSFV_C algorithm.
出处 《计算机应用研究》 CSCD 北大核心 2011年第9期3253-3255,共3页 Application Research of Computers
基金 国家自然科学基金资助项目(70771007) 中央高校基本科研业务费专项资金资助项目(FRF-TP-10-006B)
关键词 高维数据聚类 CABOSFV_C算法 拓展集合差异度 CAESD算法 high dimensional data clustering CABOSFV_C algorithm extended set dissimilarity CAESD algorithm
  • 相关文献

参考文献5

二级参考文献74

  • 1张敏,于剑.基于划分的模糊聚类算法[J].软件学报,2004,15(6):858-868. 被引量:176
  • 2汪祖媛,庄镇泉,王煦法.逐维聚类的相似度索引算法[J].计算机研究与发展,2004,41(6):1003-1009. 被引量:5
  • 3刘青,杨小涛.基于支持向量机的微阵列基因表达数据分析方法[J].小型微型计算机系统,2005,26(3):363-366. 被引量:8
  • 4Han Jiawei,Kamber M. Data Mining:Concepts and Techniques. San Francisco, US: Morgan Kaufmann, 2001
  • 5MacQueen J B. Some methods for classification and analysis of multivariate observation//Proceeding 5^th Berkley Symposium, on Mathematical Statistics and Probability. 1967, I:281-297. University of California Press, 1967, Xvii, 666
  • 6Huang Zhexue. Clustering Large Data Sets with Mixed Numeric and Categorical Values//PAKDD'97. Singapore, World Scientific, 1997:21-35
  • 7Huang Zhexue. Extensions to the k Means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, 1998,2 : 283-304
  • 8Michael K, Ng M, Li Junjie, et al. On the impact of dissimilarity measure in K-Modes clustering algorithm. IEEE Transaction on Pattern Analysis and Machine Intelligence, 2007,29 (3) : 503-507
  • 9Li Cen, Biswas Gautam. Unsupervised learning with mixed numeric and nominal data. IEEE Transactions on Knowledge and Data Engineering, 2002,14 :673-690
  • 10Hsu C C, Chen Chinlong, Su Yuwei. Hierarchical clustering of mixed data based on distance hierarchy. Information Sciences, 2007 :4474-4492

共引文献81

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部