分类属性高维数据基于集合差异度的聚类算法

Clustering algorithm based on set dissimilarity for high dimensional data of categorical attributes

导出

摘要提出基于集合差异度的聚类算法.算法通过定义的集合差异度和集合精简表示,直接进行一个集合内所有对象总体差异程度的计算,而不必计算两两对象间的距离,并且在不影响计算精确度的情况下对分类属性高维数据进行高度压缩,只需一次数据扫描即得到聚类结果.算法计算时间复杂度接近线性.实例表明该算法是有效的. A clustering algorithm is proposed based on set dissimilarity. Through defining set dissimilarity and set reduction, it does not calculate the distance between each pair of objects but computes the general dissimilarity of all the objects in a set directly, re- duces high-dimensional categorical data enormously without loss of computation accuracy and gets the clustering result by only once data scanning. The time complexity of the algorithm is almost linear. An example of real data shows that the clustering algorithm is effective.

作者武森魏桂英白尘张桂琼

机构地区北京科技大学经济管理学院

出处《北京科技大学学报》 EI CAS CSCD 北大核心 2010年第8期1085-1089,共5页 Journal of University of Science and Technology Beijing

基金国家自然科学基金资助项目(No.70771007)

关键词聚类高维空间集合差异度数据挖掘 clustering high-dimensional space sets dissimilarity data mining

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献3

1单世民,王新艳,张宪超.高维分类属性的子空间聚类算法[J].小型微型计算机系统,2009,30(10):2016-2021. 被引量：6
2Sen Wu,Xuedong Gao Management School, University of Science and Technology Beijing, Beijing 100083, China.CABOSFV algorithm for high dimensional sparse data clustering[J].Journal of University of Science and Technology Beijing,2004,11(3):283-288. 被引量：7
3杨博,刘大有,LIU Jiming,金弟,马海宾.复杂网络聚类方法[J].软件学报,2009,20(1):54-66. 被引量：212

二级参考文献80

1Watts D J, Strogatz SH. Collective dynamics of Small-World networks. Nature, 1998,393(6638):440-442.
2Barabasi AL, Albert R. Emergence of scaling in random networks. Science, 1999,286(5439):509-512.
3Barabasi AL, Albert R, Jeong H, Bianconi G. Power-Law distribution of the World Wide Web. Science, 2000,287(5461):2115a.
4Albert R, Barabasi AL, Jeong H. The Internet's Achilles heel: Error and attack tolerance of complex networks. Nature, 2000, 406(2115):378-382.
5Girvan M, Newman MEJ. Community structure in social and biological networks. Proc. of the National Academy of Science, 2002,9(12):7821-7826.
6Guimera R, Amaral LAN. Functional cartography of complex metabolic networks. Nature, 2005,433(7028):895-900.
7Palla G, Derenyi I, Farkas I, Vicsek T. Uncovering the overlapping community structures of complex networks in nature and society. Nature, 2005,435(7043):814-818.
8Wilkinson DM, Huberman BA. A method for finding communities of related genes. Proc. of the National Academy of Science, 2004,101(Suppl.1):5241-5248.
9Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D. Defining and identifying communities in networks. Proc. of the National Academy of Science, 2004,101 (9):2658-2663.
10Palla G, Barabasi AL, Vicsek T. Quantifying social group evolution. Nature, 2007,446(7136):664-667.

共引文献220

1宋智玲.蚁群算法优化结点和聚类技术在复杂网络中发现社团的研究[J].实验室研究与探索,2010,29(7):79-82.
2王晓芳.基于边链接权重的局部社团探测算法[J].农业网络信息,2012(12):32-33.
3武森,冯小东,吴庆海.基于稀疏指数排序的高维数据并行聚类算法[J].系统工程理论与实践,2011,31(S2):13-18. 被引量：1
4智源,行飞.复杂网络社区结构问题综述[J].阴山学刊（自然科学版）,2011,25(3):31-34. 被引量：2
5邓波,张玉超,金松昌,林旺群.基于MapReduce并行架构的大数据社会网络社团挖掘方法[J].计算机研究与发展,2013,50(S2):187-195. 被引量：10
6赵金利,张群华,余贻鑫,贾宏杰,杨锦.输电网网架结构的谱聚类分析算法[J].电力系统及其自动化学报,2009,21(4):8-11. 被引量：10
7田野,刘大有,杨博.复杂网络聚类算法在生物网络中的应用[J].计算机科学与探索,2010,4(4):330-337. 被引量：9
8王娟,靳京,钱伟中,秦志光.基于小波分解的群落流量异常检测[J].电子测量与仪器学报,2010,24(4):365-370. 被引量：5
9李峻金,向阳,牛鹏,刘丽明,芦英明.一种新的复杂网络聚类算法[J].计算机应用研究,2010,27(6):2097-2099. 被引量：8
10李孔文,顾庆,张尧,陈道蓄.一种基于聚集系数的局部社团划分算法[J].计算机科学,2010,37(7):46-49. 被引量：12

1王菁,张焕杰,杨寿保,高鹰.利用集合差异度实现基于内容聚类的P2P搜索模型[J].中国科学院研究生院学报,2007,24(2):241-247. 被引量：2
2宋艳,梁静国,张亚光.关于聚类算法中差异度计算方法小议[J].现代计算机,2004,10(5):60-62.
3常雨芳,周志锋,付建强.基于数据验证的诊断技术在FPGA中的应用[J].可编程控制器与工厂自动化（PLC FA）,2006(2):89-91.
4董付国,王平勤.分治法在中值滤波快速算法中的应用研究[J].电脑开发与应用,2007,20(6):42-43. 被引量：4
5武森,冯小东,单志广.基于不完备数据聚类的缺失数据填补方法[J].计算机学报,2012,35(8):1726-1738. 被引量：63
6李彬.计算机在各行业中的应用[J].电脑迷（数码生活）（上旬刊）,2013(5):6-7.
7吴冬妮,唐型基,杨建菊.关联规则Apriori算法在入侵检测中的应用分析[J].凯里学院学报,2011,29(6):112-113. 被引量：1
8任永昌,朱萍,李仲秋.一种专家系统知识获取时的属性约简算法[J].计算机技术与发展,2012,22(9):50-52. 被引量：2
9周兴斌,迟殿委.一种Apriori算法的改进[J].南昌大学学报（工科版）,2008,30(2):184-187. 被引量：3
10赛弗莱电子贸易（上海）有限公司：掌上型BT扫描器1660[J].现代制造,2009(17):60-60.

北京科技大学学报

2010年第8期

浏览历史

内容加载中请稍等...

分类属性高维数据基于集合差异度的聚类算法

参考文献3

二级参考文献80

共引文献220

相关作者

相关机构

相关主题

浏览历史