一种改进的K-Means算法

An Improved K-Means Clustering Algorithm

下载PDF

导出

摘要传统K-Means对算法使用者有较高的要求,需要明确K值,并确定初始中心点的位置。通过定义、检测并删除离群点,运用Canopy算法辅助确认K值范围和粗略中心点,借助Silhouette评价指标选择最优K值及其对应的聚类结果的方法,对传统K-Means算法进行改进,改进后的算法不需要手工输入K值和初始中心点。验证结果表明:改进的K-Means算法在聚类时,结果稳定准确,且当数据点数量较大时在迭代次数方面略优于传统算法。 The traditional K-Means has a high requirement for the user of the algorithm,need determine the K value and the location of the initial center point, define, detect and delete the outliers, canopy algorithm assists in identifying the range of K values and rough center points; select the optimal K value for Silhouette evaluation index and its corresponding clustering results, improve the traditional algorithm of K-means, the improved algorithm does not need to manually enter the K value and the initial center point.The results show that the improved k-means algorithm is stable and accurate when it is clustered; and when the number of data points is large, the number of iterations is slightly better than the traditional algorithm.

作者徐立 XU Li(Shangqiu Polytechnic, School of Software, Henan Shangqiu 476100,Chin)

机构地区商丘职业技术学院软件学院

出处《河北软件职业技术学院学报》 2018年第2期18-20,共3页 Journal of Hebei Software Institute

基金河南省社科联河南省经团联调研课题(SKL-2016-2062)

关键词 K-均值聚类算法离群点仿真实验 Silhouette指标 K-Means clustering algorithm outlier point similarity calculation Silhouette index

分类号 TP312 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献6

1戈国梁.基于大数据模糊K均值聚类的英语教学能力评估算法研究[J].现代电子技术,2017,40(20):31-33. 被引量：10
2杨博,刘钰洋,潘懋.基于Minkowski泛函和K-means聚类算法的岩石类型划分[J].科学技术与工程,2017,17(22):63-67. 被引量：10
3邵秀丽,印捷舟,张华东,王志刚.数据分析公共服务平台上K-Means算法的集成实现[J].南开大学学报（自然科学版）,2017,50(4):41-45. 被引量：2
4傅德胜,周辰.基于密度的改进K均值算法及实现[J].计算机应用,2011,31(2):432-434. 被引量：76
5牛琨,张舒博,陈俊亮.融合网格密度的聚类中心初始化方案[J].北京邮电大学学报,2007,30(2):6-10. 被引量：15
6余长俊,张燃.云环境下基于Canopy聚类的FCM算法研究[J].计算机科学,2014,41(B11):316-319. 被引量：21

二级参考文献39

1余丹.关于查全率和查准率的新认识[J].西南民族大学学报（人文社会科学版）,2009,30(2):283-285. 被引量：15
2姚军,赵秀才,衣艳静,陶军.数字岩心技术现状及展望[J].油气地质与采收率,2005,12(6):52-54. 被引量：91
3陆林花,王波.一种改进的遗传聚类算法[J].计算机工程与应用,2007,43(21):170-172. 被引量：26
4McQUEEN J. Some methods for classification and analysis of multivariate observations[ C]//Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press, 1967:281 -297.
5AISABTI K, RANKA S, SINGH V. An efficient K-means clustering algorithm[ C]// IPPS/SPDP Workshop on High Performance Data Mining. Orlando, Florida: [s. n.], 1998:9 - 15.
6ESTER M, KRIEGEL H P, SANDER J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise [ C]// Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland: AAAI, 1996:226 - 231.
7David aha and fellow graduate students at UC irvine [ EB/OL]. [ 2010 -06 -01 ]. http://archive, ics. uci. edu/ml/datasets. html.
8Han J W,Kamber M.Data mining concepts and techniques[M].Singapore:Elesvier Inc,2006:402-404.
9Ye Yunming,Huang Zhexue,Chen Xiaojun,et al.Neighborhood density method for selecting initial cluster centers in K-means clustering[C]∥Proceedings of PAKDD '06:Advances in Knowledge Discovery and Data Mining,10th Pacific-Asia Conference.Singapore:Springer,2006:189-198.
10He Ji,Lan M,Tan C L,et al.Initialization of cluster refinement algorithms:a review and comparative study[C]∥Proceedings of International Joint Conference on Neural Networks.Budapest:[s.n.],2004:297-302.

共引文献124

1宋军英,崔益伟,李欣然,钟伟,邹鑫,李培强.基于欧氏动态时间弯曲距离与熵权法的负荷曲线聚类方法[J].电力系统自动化,2020(15):87-98. 被引量：31
2张健沛,杨悦,杨静,张泽宝.基于最优划分的K-Means初始聚类中心选取算法[J].系统仿真学报,2009,21(9):2586-2590. 被引量：61
3申晓勇,雷英杰,蔡茹,雷阳.一种基于密度函数的直觉模糊聚类初始化方法[J].计算机科学,2009,36(5):197-199. 被引量：7
4陈利虎,张尔扬,沈荣骏.基于优化初始聚类中心K-Means算法的跳频信号分选[J].国防科技大学学报,2009,31(2):70-75. 被引量：23
5姜永森,陆媛,杨慧中.一种模糊相似关系的基因表达数据聚类方法[J].计算机工程与应用,2011,47(8):236-238. 被引量：2
6吴夙慧,成颖,郑彦宁,潘云涛.K-means算法研究综述[J].现代图书情报技术,2011(5):28-35. 被引量：161
7禹贵辉,潘志斌,乔瑞萍,邹彬.基于数据分布特性的聚类中心初始化方法[J].微电子学与计算机,2011,28(11):152-156. 被引量：4
8苏志刚,韩佩佩,吴仁彪.基于数据挖掘的快速记录存储器数据处理技术[J].信息与电子工程,2012,10(1):118-123. 被引量：2
9张凯丽,李志勇.关于工程机械租赁市场的思考[J].工程机械与维修,2000(2):24-29. 被引量：1
10王培崇,钱旭,雷凤君.新的混合小生境鱼群聚类算法[J].计算机应用,2012,32(8):2189-2192. 被引量：7

1彭育辉,杨辉宝,李孟良,乔学齐.基于K-均值聚类分析的城市道路汽车行驶工况构建方法研究[J].汽车技术,2017(11):13-18. 被引量：28
2Wenmin Huang,Jiquan Ma,Enbin Zhang.Using Gaussian Mixture Model to Fix Errors in SFS Approach Based on Propagation[J].国际计算机前沿大会会议论文集,2016(1):182-183.
3张天骐,杨强,宋玉龙,熊梅.一种K-means改进算法的软扩频信号伪码序列盲估计[J].电子与信息学报,2018,40(1):226-234. 被引量：16
4姜莹礁.基于聚类神经网络的人体行为识别研究[J].海峡科技与产业,2018,31(1):68-69.
5Sanda Jegere,Inga Narbute,Andrejs Erglis.Use of intravascular imaging in managing coronary artery disease[J].World Journal of Cardiology,2014,6(6):393-404. 被引量：8

河北软件职业技术学院学报

2018年第2期

浏览历史

内容加载中请稍等...

一种改进的K-Means算法

参考文献6

二级参考文献39

共引文献124

相关作者

相关机构

相关主题

浏览历史