从多角度分析现有聚类算法(英文) 被引量：86

Analyzing Popular Clustering Algorithms from Different Viewpoints

下载PDF

导出

摘要聚类是数据挖掘中研究的重要问题之一.聚类分析就是把数据集分成簇,以使得簇内数据尽量相似,簇间数据尽量不同.不同的聚类方法采用不同的相似测度和技术.从以下3个角度分析现有流行聚类算法: (1)聚类尺度; (2)算法框架; (3)簇的表示.在此基础上,分析了一些综合或概括了一些其他方法的算法.由于分析从3个角度进行,所提出的方法能够涵盖,并区分绝大多数现有聚类算法.所做的工作是自调节聚类方法以及聚类基准测试研究的基础. Clustering is widely studied in data mining community. It is used to partition data set into clusters so that intra-cluster data are similar and inter-cluster data are dissimilar. Different clustering methods use different similarity definition and techniques. Several popular clustering algorithms are analyzed from three different viewpoints: (1) clustering criteria, (2) cluster representation, and (3) algorithm framework. Furthermore, some new built algorithms, which mix or generalize some other algorithms, are introduced. Since the analysis is from several viewpoints, it can cover and distinguish most of the existing algorithms. It is the basis of the research of self-tuning algorithm and clustering benchmark.

作者钱卫宁周傲英

机构地区复旦大学计算机科学系复旦大学智能信息处理开放实验室

出处《软件学报》 EI CSCD 北大核心 2002年第8期1382-1394,共13页 Journal of Software

基金 ~~国家重点基础研究发展规划973项目 ~~国家教育部博士点基金

关键词多角度分析聚类算法数据挖掘数据库数据集 data mining clustering algorithm

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献36

1[1]Fasulo, D. An analysis of recent work on clustering algorithms. Technical Report, Department of Computer Science and Engineering, University of Washington, 1999. http://www.cs.washington.edu.
2[2]Baraldi, A., Blonda, P. A survey of fuzzy clustering algorithms for pattern recognition. IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), 1999,29:786～801.
3[3]Keim, D.A., Hinneburg, A. Clustering techniques for large data sets - from the past to the future. Tutorial Notes for ACM SIGKDD 1999 International Conference on Knowledge Discovery and Data Mining. San Diego, CA, ACM, 1999. 141～181.
4[4]McQueen, J. Some methods for classification and Analysis of Multivariate Observations. In: LeCam, L., Neyman, J., eds. Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. 1967. 281～297.
5[5]Zhang, T., Ramakrishnan, R., Livny, M. BIRCH: an efficient data clustering method for very large databases. In: Jagadish, H.V., Mumick, I.S., eds. Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data. Quebec: ACM Press, 1996. 103～114.
6[6]Guha, S., Rastogi, R., Shim, K. CURE: an efficient clustering algorithm for large databases. In: Haas, L.M., Tiwary, A., eds. Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. Seattle: ACM Press, 1998. 73～84.
7[7]Beyer, K.S., Goldstein, J., Ramakrishnan, R., et al. When is 'nearest neighbor' meaningful? In: Beeri, C., Buneman, P., eds. Proceedings of the 7th International Conference on Data Theory, ICDT'99. LNCS1540, Jerusalem, Israel: Springer, 1999. 217～235.
8[8]Ester, M., Kriegel, H.-P., Sander, J., et al. A density-based algorithm for discovering clusters in large spatial databases with noises. In: Simoudis, E., Han, J., Fayyad, U.M., eds. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD'96). AAAI Press, 1996. 226～231.
9[9]Ester, M., Kriegel, H.-P., Sander, J., et al. Incremental clustering for mining in a data warehousing environment. In: Gupta, A., Shmueli, O., Widom, J., eds. Proceedings of the 24th International Conference on Very Large Data Bases. New York: Morgan Kaufmann, 1998. 323～333.
10[10]Sander, J., Ester, M., Kriegel, H.-P., et al. Density-Based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Mining and Knowledge Discovery, 1998,2(2):169～194.

同被引文献516

1张尧,樊治平.一种基于残缺语言判断矩阵的群决策方法[J].运筹与管理,2007,16(3):31-35. 被引量：14
2王守强,朱大铭,史士英.基于最小聚类划分的K-means聚类(1+ε)近似算法[J].计算机研究与发展,2008,45(z1):26-30. 被引量：5
3何登旭,曲良东.一种新的混合聚类分析算法[J].计算机应用研究,2009,26(3):879-880. 被引量：7
4叶志伟,郑肇葆.蚁群算法中参数α、β、ρ设置的研究——以TSP问题为例[J].武汉大学学报（信息科学版）,2004,29(7):597-601. 被引量：155
5李洁,高新波,焦李成.基于克隆算法的网络结构聚类新算法[J].电子学报,2004,32(7):1195-1199. 被引量：24
6崔杰,李陶深,兰红星.基于Hadoop的海量数据存储平台设计与开发[J].计算机研究与发展,2012,49(S1):12-18. 被引量：141
7吴泓辰,王新军,成勇,彭朝晖.基于协同过滤与划分聚类的改进推荐算法[J].计算机研究与发展,2011,48(S3):205-212. 被引量：20
8周国亮,宋亚奇,王桂兰,朱永利.状态监测大数据存储及聚类划分研究[J].电工技术学报,2013,28(S2):337-344. 被引量：41
9李洁,高新波,焦李成.一种基于GA的混合属性特征大数据集聚类算法[J].电子与信息学报,2004,26(8):1203-1209. 被引量：9
10王建会,申展,胡运发.一种实用高效的聚类算法[J].软件学报,2004,15(5):697-705. 被引量：26

引证文献86

1刘英林,陈玉柱,丁文静,程红云.钢卷表面缺陷分布特征发现方法研究[J].冶金自动化,2020,44(1):27-31. 被引量：2
2毛颖颖,杨新凯.融合拓扑势的自适应层次聚类算法研究[J].计算机应用研究,2020,37(S01):37-39.
3李华,贾雪.基于FM度量的自适应K-Means聚类的工业生产运行基准挖掘[J].长春大学学报,2022,32(4):22-27.
4Qi Zhang,Jianshe Cao,Yanfeng Sui.Development of a research platform for BEPCⅡ accelerator fault diagnosis[J].Radiation Detection Technology and Methods,2020,4(3):269-276.
5郭景峰,赵玉艳,边伟峰,李晶.基于改进的凝聚性和分离性的层次聚类算法[J].计算机研究与发展,2008,45(z1):202-206. 被引量：15
6王建会,申展,胡运发.一种实用高效的聚类算法[J].软件学报,2004,15(5):697-705. 被引量：26
7张虎,郑家恒,刘江.语料库词性标注一致性检查方法研究[J].中文信息学报,2004,18(5):11-16. 被引量：9
8杨涛,李龙澍.一种基于粗糙集聚类的数据约简算法[J].系统仿真学报,2004,16(10):2195-2197. 被引量：5
9张虎,郑家恒,刘江.汉语语料库词性标注自动校对方法研究[J].计算机应用,2005,25(1):17-19. 被引量：1
10栾丽华,吉根林.一种基于四叉树的快速聚类算法[J].计算机应用,2005,25(5):1001-1003. 被引量：6

二级引证文献642

1陈新宇,唐沉,郑超琦,刘英林,陈玉柱,张超.基于贝叶斯网络的钢铁缺陷溯源方法[J].冶金自动化,2022,46(S01):52-55. 被引量：2
2刘英林,陈玉柱,丁文静,程红云.钢卷表面缺陷分布特征发现方法研究[J].冶金自动化,2020,44(1):27-31. 被引量：2
3刘超,鲁舒婷,贾梦瑶,韩坤茹.知识创造视角下智力资本对高校创新能力的影响研究[J].劳动经济评论,2023(1):15-28.
4董莉娜,王如琪,刘群.一种结合数据势能的图像补全方法[J].计算机应用研究,2020,37(S02):362-364.
5毛颖颖,杨新凯.融合拓扑势的自适应层次聚类算法研究[J].计算机应用研究,2020,37(S01):37-39.
6侯冀超,谢成心,孟凡兴,温秀梅.基于模糊聚类处理月亮型数据的研究与实现[J].河北建筑工程学院学报,2022,40(3):173-178.
7郑晓鸣,吕士颖,王晓东.一种基于随机抽取的有限深度层次聚类[J].郑州大学学报（理学版）,2007,39(3):80-83.
8赫南,淦文燕,李德毅,康建初.一个小型演员合作网的拓扑性质分析[J].复杂系统与复杂性科学,2006,3(4):1-10. 被引量：16
9李玉鑑.自适应K-均值聚类算法[J].计算机研究与发展,2007,44(z2):100-104. 被引量：5
10李凯,田双亮,耿丽君,丁丽丽.基于数据场的人脸特征提取[J].西北民族大学学报（自然科学版）,2009,30(4):32-36. 被引量：2

1沈易.对学生在掌握循环结构时所遇困难的多角度分析及对策[J].数学教学,2009(10):3-5.
2深蓝.让你装机明明白白——多角度分析AMD与Intel平台[J].网友世界,2005(19):65-66.
3刘志宏.数据库系统与操作系统的关系与配置[J].现代计算机,1996(6):30-32.
4梅梦.聚类算法的分析与研究[J].科技广场,2007(11):26-27. 被引量：1
5何红雷.小身材大娱乐——惠普Pavilion s3738cn家用台式机[J].微电脑世界,2009(3):33-33.
6罗瑞评,许云林,张洪顺,巩艳.基于数据仓库的辅助决策系统研究[J].中国无线电,2012(5):51-53.
7段新娥.浅议面向对象程序设计中的类与函数[J].科技情报开发与经济,2006,16(5):250-252. 被引量：1
8涂承宇.传感系统中的闭环检测与自调节[J].科学通报,1992,37(3):284-285. 被引量：1
9尹晓桂.入侵检测在计算机安全防护中的应用[J].电子游戏软件,2012(14):47-49.
10李田,史浩山,杨俊刚.无线传感器网络LEACH协议成簇算法研究[J].传感技术学报,2010,23(8):1158-1162. 被引量：9

软件学报

2002年第8期

浏览历史

内容加载中请稍等...

从多角度分析现有聚类算法(英文) 被引量：86

参考文献36

同被引文献516

引证文献86

二级引证文献642

相关作者

相关机构

相关主题

浏览历史