拓展集合差异度高维数据聚类

Clustering for high dimensional data based on extended set dissimilarity

下载PDF

导出

摘要提出度量多个集合之间总体差异程度的拓展集合差异度及相关定理,并给出一种新的解决分类属性高维数据聚类问题的CAESD算法。基于拓展集合差异度及拓展集合特征向量,在CABOSFV_C聚类的基础上通过两阶段聚类完成全部聚类过程。采用UCI数据集与K-modes及其改进算法、CABOSFV_C算法进行比较实验,结果表明CAESD算法具有较高的聚类正确率。 This paper proposed extended set dissimilarity and related theory to measure the general dissimilarity among sets,and proposed a new algorithm to cluster high dimensional data named as clustering algorithm based on extended set dissimilarity for categorical attributes（CAESD）,which executed two steps clustering process using extended set dissimilarity and extended set feature vector on the basis of CABOSFV_C algorithm.Comparative tests using UCI data sets show that CAESD algorithm has higher clustering accuracy than K-modes algorithm,improved approaches of K-modes and CABOSFV_C algorithm.

作者武森叶俞飞俞晓莉

机构地区北京科技大学经济管理学院海信集团海信学院

出处《计算机应用研究》 CSCD 北大核心 2011年第9期3253-3255,共3页 Application Research of Computers

基金国家自然科学基金资助项目(70771007) 中央高校基本科研业务费专项资金资助项目(FRF-TP-10-006B)

关键词高维数据聚类 CABOSFV_C算法拓展集合差异度 CAESD算法 high dimensional data clustering CABOSFV_C algorithm extended set dissimilarity CAESD algorithm

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献5

1陈慧萍,王煜,王建东.高维数据挖掘算法的研究与进展[J].计算机工程与应用,2006,42(24):170-173. 被引量：8
2贺玲,吴玲达,蔡益朝.高维空间中数据的相似性度量[J].数学的实践与认识,2006,36(9):189-194. 被引量：20
3Sen Wu,Xuedong Gao Management School, University of Science and Technology Beijing, Beijing 100083, China.CABOSFV algorithm for high dimensional sparse data clustering[J].Journal of University of Science and Technology Beijing,2004,11(3):283-288. 被引量：7
4白亮,梁吉业,曹付元.基于粗糙集的改进K-Modes聚类算法[J].计算机科学,2009,36(1):162-164. 被引量：15
5贺玲,蔡益朝,杨征.高维数据聚类方法综述[J].计算机应用研究,2010,27(1):23-26. 被引量：42

二级参考文献74

1张敏,于剑.基于划分的模糊聚类算法[J].软件学报,2004,15(6):858-868. 被引量：176
2汪祖媛,庄镇泉,王煦法.逐维聚类的相似度索引算法[J].计算机研究与发展,2004,41(6):1003-1009. 被引量：5
3刘青,杨小涛.基于支持向量机的微阵列基因表达数据分析方法[J].小型微型计算机系统,2005,26(3):363-366. 被引量：8
4Han Jiawei,Kamber M. Data Mining:Concepts and Techniques. San Francisco, US: Morgan Kaufmann, 2001
5MacQueen J B. Some methods for classification and analysis of multivariate observation//Proceeding 5^th Berkley Symposium, on Mathematical Statistics and Probability. 1967, I:281-297. University of California Press, 1967, Xvii, 666
6Huang Zhexue. Clustering Large Data Sets with Mixed Numeric and Categorical Values//PAKDD'97. Singapore, World Scientific, 1997:21-35
7Huang Zhexue. Extensions to the k Means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, 1998,2 : 283-304
8Michael K, Ng M, Li Junjie, et al. On the impact of dissimilarity measure in K-Modes clustering algorithm. IEEE Transaction on Pattern Analysis and Machine Intelligence, 2007,29 (3) : 503-507
9Li Cen, Biswas Gautam. Unsupervised learning with mixed numeric and nominal data. IEEE Transactions on Knowledge and Data Engineering, 2002,14 :673-690
10Hsu C C, Chen Chinlong, Su Yuwei. Hierarchical clustering of mixed data based on distance hierarchy. Information Sciences, 2007 :4474-4492

共引文献81

1武森,冯小东,吴庆海.基于稀疏指数排序的高维数据并行聚类算法[J].系统工程理论与实践,2011,31(S2):13-18. 被引量：1
2沈萍.高维数据挖掘技术研究[J].电脑知识与技术（过刊）,2009,0(6):1301-1303. 被引量：1
3文贵华.面向机器学习的相对变换[J].计算机研究与发展,2008,45(4):612-618. 被引量：10
4沈超,何秀美.我国农村移动通信客户消费行为模式研究[J].广东通信技术,2009,29(3):15-20.
5黄斯达,陈启买.基于相似性度量的高维聚类算法的研究[J].微计算机信息,2009,25(27):187-188. 被引量：4
6黄斯达,陈启买.一种基于相似性度量的高维数据聚类算法的研究[J].计算机应用与软件,2009,26(9):102-105. 被引量：13
7赵兴旺,梁吉业,曹付元.符号数据最佳聚类个数的确定方法[J].广西师范大学学报（自然科学版）,2009,27(3):130-133.
8詹棠森,林卫中.基于数据最优分区间相似度算法及应用[J].数学的实践与认识,2009,39(20):31-34. 被引量：6
9赵兹,马江洪.信息检索中的两个数据融合方法比较[J].计算机应用,2010,30(A01):54-56. 被引量：1
10武森,魏桂英,白尘,张桂琼.分类属性高维数据基于集合差异度的聚类算法[J].北京科技大学学报,2010,32(8):1085-1089.

1武森,王蔷,姜敏,魏青.考虑加权排序的分类数据聚类算法[J].北京科技大学学报,2013,35(8):1093-1098. 被引量：2
2朱桂宏,王刚.基于数据流的网络入侵检测研究[J].计算机技术与发展,2009,19(3):175-177. 被引量：2
3陈新泉.混合属性数据集的基于近邻连接的两阶段聚类算法[J].计算机工程与科学,2012,34(9):135-142.
4王改华,李德华.融合纹理特征的两阶段聚类分割算法[J].中国图象图形学报,2012,17(9):1075-1084. 被引量：3
5焦远锋,李万龙,郑山红,刘帅.一种新的两阶段FCM聚类算法[J].微电子学与计算机,2009,26(4):40-42. 被引量：4
6刘希宋,喻登科,李玥.基于客户知识的客户CABOSFV聚类[J].情报杂志,2008,27(2):7-9. 被引量：3
7祁立,刘玉树.基于两阶段聚类的模糊支持向量机[J].计算机工程,2008,34(1):4-6. 被引量：5
8冯少荣.一种提高文本聚类算法质量的方法[J].同济大学学报（自然科学版）,2008,36(12):1712-1718. 被引量：3
9吴玉霞,牟援朝.基于两阶段聚类的洗钱行为识别[J].计算机工程,2010,36(15):60-62. 被引量：5
10李晓利,汤光明.基于通信流量特征的隐秘P2P僵尸网络检测[J].计算机应用研究,2013,30(6):1867-1870. 被引量：1

计算机应用研究

2011年第9期

浏览历史

内容加载中请稍等...

拓展集合差异度高维数据聚类

参考文献5

二级参考文献74

共引文献81

相关作者

相关机构

相关主题

浏览历史