改进的共享最近邻聚类算法被引量：3

Improved shared nearest neighbor clustering algorithm

下载PDF

导出

摘要聚类是一种无监督的机器学习方法,其任务是发现数据中的自然簇。共享最近邻聚类算法(SNN)在处理大小不同、形状不同以及密度不同的数据集上具有很好的聚类效果,但该算法还存在以下不足:(1)时间复杂度为O(n2),不适合处理大规模数据集;(2)没有明确给出参数阈值的简单指导性操作方法;(3)只能处理数值型属性数据集。对共享最近邻算法进行改进,使其能够处理混合属性数据集,并给出参数阈值的简单选择方法,改进后算法运行时间与数据集大小成近似线性关系,适用于大规模高维数据集。在真实数据集和人造数据集上的实验结果表明,提出的改进算法是有效可行的。 Clustering is a method of unsupervised learning in machine learning,the typical task of which is to discovery “natural” clusters present in the data.The shared nearest neighbor algorithm is one of the most efficient clustering algorithm which can handle datasets of different sizes,shapes and densities.But there are still some shortages about the algorithm.SNN can’t handle large dataset because of its high complexity.There are no definite methods about threshold of the algorithm.SNN can not process databases with mixture attributes.This paper improves the SNN algorithm to process the data with categorical attributes,gives a simple definite method to select threshold of the algorithm.The time complexity of the improved algorithm is nearly linear with the size of dataset and can be used to large dataset.The experimental results on real datasets and synthetic datasets show that the improved algorithm is effective and practicable.

作者李霞蒋盛益

机构地区广东外语外贸大学思科信息学院

出处《计算机工程与应用》 CSCD 北大核心 2011年第8期138-142,共5页 Computer Engineering and Applications

基金国家自然科学基金(No.61070061)~~

关键词共享最近邻聚类算法一趟聚类算法大规模数据集 shared nearest neighbor clustering algorithm one-pass clustering algorithm large dataset

分类号 TP393 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献3

1刘敏娟,柴玉梅.基于网格的共享近邻聚类算法[J].计算机应用,2006,26(7):1673-1675. 被引量：7
2何增有,徐晓飞,邓胜春.Squeezer：An Efficient Algorithm for Clustering Categorical Data[J].Journal of Computer Science & Technology,2002,17(5):611-624. 被引量：32
3耿技,印鉴.改进的共享型最近邻居聚类算法[J].电子科技大学学报,2006,35(1):70-72. 被引量：5

二级参考文献32

1Guha S,Rastogi R,Shim K.Cure:An efficient clustering algorithm for large databases[C]//1998 ACM-SIGMOD Int.Conf.Management of Data (SIGMOD'98),seattle WA.USA:1998:73-84.
2Ertoz L,Michael,S,Vipin Kumar.A new shared nearest neighbor clustering algorithm and its applications[C]//Workshop on Clustering High Dimensional Data and its Applications,Second SIAM International Conference on Data Mining,Arlington,VA,USA:2002.
3Ertoz L,Michael S,Vipin Kumar.Finding Clusters of Different Sizes,Shapes,and Densities in Noisy,High Dimensional Data[C].//Proceedings of Third SIAM International Conference on Data Mining,San Francisco,CA,USA:2003.
4Stephen D B,Mark S.Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule[C]//Conference on Knowledge Discovery in Data archive Proceedings of the ninth ACM SIGKDD International Conference (KDD),29-38,Washington,USA:2003:29-38.
5KAUFMAN L, ROUSSEEUW PJ. Finding Groups in Data: An Introduction to Cluster Analysis[ M]. New York: John Wiley & Sons, 1990.
6ESTER M, KRIEGEL HP, SANDER J, et al. A density-based algorithm for discovering clusters in large spatial databases[A]. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining[C].1996, 8. 226 -231.
7ANKERST M, BREUNIG M, KRIEGEL HP, et al. OPTICS: Ordering points to identify the clustering structure[ A]. Proceedings of ACM SIGMOD International Conference on Management of Data(SIGMOD'99) [ C]. Philadelphia, PA, 1999. 49 -60.
8WANG W, YANG J, MUNTZ R. STING: A statistical information grid approach to spatial data mining[ A]. Proceedings of the 23rd International Conference on Very Large Databases [ C]. AThens,Greece, 1997. 186-195.
9SHEIKHOLESLAMI G, CHATTERJEE S, ZHANG A. WaveCluster: A multi-resolution clustering approach for very large spatial databases[A]. Proceedings of 1998 International Conference on Very Large Data Bases[ C]. New York, 1998. 428 - 439.
10AGRAWAL R, GEHRKE J, GUNOPULOS D, et al. Automatic subspace clustering of high dimensional data for data mining applications[ A]. ACM SIGMOD International Conference on Management of Data[C]. Seattle, WA, 1998. 94-105.

共引文献41

1卓琳,赵厚宇,詹思延.异常检测方法及其应用综述[J].计算机应用研究,2020,37(S01):9-15. 被引量：25
2张明状,庄严,王伟,顾明伟.移动机器人即时环境探索与地图库构建[J].东南大学学报（自然科学版）,2009,39(S1):111-115. 被引量：5
3蒋盛益,李庆华.一种基于引力的聚类方法[J].计算机应用,2005,25(2):286-288. 被引量：9
4蒋盛益,李庆华.聚类分析中的差异性度量方法研究[J].计算机工程与应用,2005,41(11):146-149. 被引量：4
5蒋盛益,李庆华,李新.数据流挖掘算法研究综述[J].计算机工程与设计,2005,26(5):1130-1132. 被引量：21
6蒋盛益,李庆华,王卉,孟中楼.一种基于聚类的有指导的入侵检测方法[J].小型微型计算机系统,2005,26(6):1042-1045. 被引量：6
7蒋盛益,李庆华,赵延喜.一种两阶段异常检测方法[J].小型微型计算机系统,2005,26(7):1237-1240. 被引量：7
8蒋盛益,李庆华.基于引力的入侵检测方法[J].系统仿真学报,2005,17(9):2202-2206. 被引量：6
9郝凯,朱敏.有源雷达组网目标定位中去除虚假目标的改进方法[J].四川大学学报（自然科学版）,2006,43(2):315-319. 被引量：4
10蒋盛益,阮幼林,李庆华.面向混合属性的高效聚类算法研究[J].计算机工程,2006,32(12):47-49.

同被引文献25

1张鑫王文剑.一种基于粒度的支持向量机学习策略.计算机科学,2008,35(8):101-103,116.
2Vapnik V.The Nature of Statistical Learning Theory[M].New York:Springer-Verlay Press,1995:156.
3Yuchun Tang.Granular Support Vector Machines Based On Granular Computing,Soft Computing and Statistical Learning[D].Georgia State University,2006.
4Shifei Ding,Bingjun Qi.Research of granular support vector machine[J].Artif Intell Rev,2012,38(5):1-7.
5Wang Wenjian,Guo Husheng,Jia Yuanfeng,et al.Granular support vector machine based on mixed measure[J].Neurocomputing,2013,101(5):116-128.
6Yuchun Tang,Bo Jin,Yanqing Zhang.Granular support vector machines with association rules mining for protein homology prediction[M].Artificial Intelligence in Medicine,2005(35):121-134.
7Mei Zhen,Shen Qi,Ye Baoxiao.Hybriedized KNN and SVM for gene expression data classification[J].Life Sci.,2009,6:61-66.
8Lam Hong,lee,Chin Heng,et al.A Review of Nearest Neighbor-Support Vector Machines Hybrid Classification Models[J].Journal of Applied Sciences,2010,10(17):1841-1858.
9Jarvis R A,Patrick EA.Clustering.Using a Similarity Measure Based on Shared Nearest Neighbors[J].IEEE Transacitions on Computers,1973,C-22(11):1025-1034.
10Ertoz L,Steinbach M,Kumar V.A New Shared Nearest Neighbor Clustering Algorithm and its Applications[C]//Workshop on Clustering High Dimensional Data and its Applications,Proc.of Text Mine’01,First SIAM intl.Conf.on Data Mining,Chicago,IL,USA,2001.

引证文献3

1王建国,范凯,张文兴.一种结合共享最近邻法和粒度支持向量机的混合模型[J].计算机应用与软件,2015,32(6):236-240.
2伍荣,褚龙,余兴华.大数据技术在信息安全领域中的应用[J].通信技术,2017,50(6):1295-1298. 被引量：11
3贺东风,黄涵锐.基于聚类思想的转炉终点碳含量预测方法[J].冶金能源,2022,41(6):29-34. 被引量：5

二级引证文献16

1张哲,赵立华,顾超.一种综合使用BP神经网络对转炉终点锰含量预测的新方法[J].冶金自动化,2023,47(S01):158-162. 被引量：1
2方晓炎.刍议大数据技术在信息安全领域中的应用[J].电脑编程技巧与维护,2018(6):91-92. 被引量：3
3王大鹏.浅析商业银行大数据应用[J].中国商论,2018,0(34):15-16. 被引量：1
4宋璐璐.试论信息安全领域中大数据技术的应用[J].技术与市场,2019,26(5):123-123. 被引量：3
5孔宇强.大数据技术在信息安全系统中的应用研究[J].无线互联科技,2020,17(3):57-58. 被引量：4
6侯宇.大数据技术在人工智能中的应用[J].电子技术与软件工程,2020(9):186-187. 被引量：1
7高刚强.基于大数据的社交平台用户个人信息安全保护策略[J].数码设计,2020,9(24):31-31.
8潘娟娟,李明.基于大数据技术的电子支付信息安全加密系统[J].现代电子技术,2021,44(13):71-74. 被引量：12
9赵宾华,杨国瑞,贾哲.基于人工智能的网络空间防御技术[J].计算机与网络,2021,47(12):57-60.
10徐安军,崔志峰.冶金流程工程学研究及其发展[J].钢铁,2023,58(11):2-9.

1苏晓珂,郑远攀,万仁霞.基于共享最近邻的离群检测算法[J].计算机应用研究,2012,29(7):2426-2428. 被引量：2
2周晓云,孙志挥,张柏礼.一种大规模高维数据集的高效聚类算法[J].应用科学学报,2006,24(4):396-400. 被引量：2
3张雨婷,叶东毅,柯逍,陈昭炯.适应目标尺度变化的改进压缩跟踪算法[J].模式识别与人工智能,2016,29(11):985-996.
4李霞,蒋盛益.一种垃圾邮件快速识别方法[J].小型微型计算机系统,2013,34(3):498-502. 被引量：2
5曹海,孙婧,史喜斌.基于特征迭代的短文本去重算法[J].计算机工程,2015,41(12):54-57. 被引量：4
6何琼,叶茎,陈铁.大规模高维数据集环境下的路面使用性能评价方法研究[J].自动化技术与应用,2011,30(9):1-5.
7郑灵芝,黄德才.基于最近邻相似度的孤立点检测及半监督聚类算法[J].计算机系统应用,2012,21(2):117-121. 被引量：3
8高兵,张健沛,邹启杰.基于共享最近邻密度的演化数据流聚类算法[J].北京科技大学学报,2014,36(12):1703-1711. 被引量：1
9高学东,王立敏,马红权,武森.基于共享最近邻探测社团结构的算法[J].系统工程理论与实践,2009,29(10):102-109. 被引量：5
10罗浩,方志祥,萧世伦.基于谷歌眼镜传感器的曲线拟合计步算法[J].计算机工程与应用,2016,52(18):40-45. 被引量：4

计算机工程与应用

2011年第8期

浏览历史

内容加载中请稍等...

改进的共享最近邻聚类算法被引量：3

参考文献3

二级参考文献32

共引文献41

同被引文献25

引证文献3

二级引证文献16

相关作者

相关机构

相关主题

浏览历史

改进的共享最近邻聚类算法 被引量：3

参考文献3

二级参考文献32

共引文献41

同被引文献25

引证文献3

二级引证文献16

相关作者

相关机构

相关主题

浏览历史

改进的共享最近邻聚类算法被引量：3