Improvement and Parallelism of k-Means Clustering Algorithm 被引量：2

Improvement and Parallelism of k-Means Clustering Algorithm

导出

摘要 The k-means clustering algorithm is one of the most commonly used algorithms for clustering analysis. The traditional k-means algorithm is, however, inefficient while working on large numbers of data sets and improving the algorithm efficiency remains a problem. This paper focuses on the efficiency issues of cluster algorithms. A refined initial cluster centers method is designed to reduce the number of iterative procedures in the algorithm. A parallel k-means algorithm is also studied for the problem of the operation limitation of a single processor machine when given huge data sets. The analytical results demonstrate that these improvements can greatly enhance the efficiency of the k-means algorithm, i.e., allow the grouping of a large number of data sets more accurately and more quickly. The analysis has theoretical and practical importance for work on the improvement and parallelism of cluster algorithms. The k-means clustering algorithm is one of the most commonly used algorithms for clustering analysis. The traditional k-means algorithm is, however, inefficient while working on large numbers of data sets and improving the algorithm efficiency remains a problem. This paper focuses on the efficiency issues of cluster algorithms. A refined initial cluster centers method is designed to reduce the number of iterative procedures in the algorithm. A parallel k-means algorithm is also studied for the problem of the operation limitation of a single processor machine when given huge data sets. The analytical results demonstrate that these improvements can greatly enhance the efficiency of the k-means algorithm, i.e., allow the grouping of a large number of data sets more accurately and more quickly. The analysis has theoretical and practical importance for work on the improvement and parallelism of cluster algorithms.

作者田金兰朱林张素琴刘璐

机构地区 Department of Computer Science and Technology

出处《Tsinghua Science and Technology》 SCIE EI CAS 2005年第3期277-281,共5页 清华大学学报（自然科学版（英文版）

基金 Supported by the National Defence Science and Technology Research Foundation of China (No. 99J15.3.2.JW0116)

关键词 data mining cluster analysis k-means algorithm PARALLELISM data mining cluster analysis k-means algorithm parallelism

分类号 TP301.6 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

同被引文献19

1李庆华,苏珊.一种应用于入侵检测的并行聚类算法[J].计算机工程,2005,31(5):151-152. 被引量：1
2刘维峰,卢伟,许海燕.基于局域网和MPI的PC集群计算环境[J].计算机工程与设计,2005,26(5):1327-1329. 被引量：9
3金松河,钱慎一,张素智.基于Web日志的高精度聚类算法[J].河南科技大学学报（自然科学版）,2006,27(2):49-51. 被引量：4
4牛向阳.基于遗传算法和BP算法的混合算法[J].河南科技大学学报（自然科学版）,2007,28(1):46-48. 被引量：13
5王辉,高利军,王听忠.个性化服务中基于用户聚类的协同过滤推荐[J].计算机应用,2007,27(5):1225-1227. 被引量：43
6陈哲,魏衍君.XML数据本体抽取算法研究[J].河南科技大学学报（自然科学版）,2007,28(5):36-39. 被引量：2
7LIKAS A,VLASSIS M,VERBEEK J.The global K-means clustering algorithm[J].Pattern Recognition,2003,36(2):451-461.
8CHAKRABORTY S J,NAGWANI N K.Analysis and study of incremental K-means clustering algorithm[A].2011International Conference on communications and Information Science[C].Chandigarh:Springer-Verlag,2011:338-341.
9ZHONG Wei,ALTUN G,HARRISON R,et al.Improved K-means clustering algorithm for exploring local protein sequence motifs representing common structural property[J].IEEE Transactions on NanoBio Science,2005,4(3):255-265.
10WANG J T,SU X L.An improved K-means clustering algorithm[A].2011IEEE 3rd International Conference on Communication Software and Networks[C].Xi’an:[s.n.],2011:44-46.

引证文献2

1王辉,张望,范明.基于集群环境的K-Means聚类算法的并行化[J].河南科技大学学报（自然科学版）,2008,29(4):42-45. 被引量：10
2周其林,雷菊阳,王昱栋,张兰兰.一种引入参数无需确定聚类数的聚类算法[J].河北工业科技,2015,32(2):123-128. 被引量：3

二级引证文献13

1孟海东,杨彦侃.并行聚类算法的设计与研究[J].计算机与现代化,2010(8):5-7. 被引量：2
2张超,李建成,王剑秦.一种基于网格计算的农业遥感数据快速分类算法[J].西安工程大学学报,2010,24(6):810-813. 被引量：1
3魏新红,张凯.一种改进的PSO-Means聚类优化算法[J].河南科技大学学报（自然科学版）,2011,32(2):41-43. 被引量：7
4原建伟,王坤,李爱国.基于GPU的K-means并行算法研究与实现[J].陕西理工学院学报（自然科学版）,2012,28(5):44-48.
5王坤.基于GPU的分类并行算法的研究与实现[J].电子设计工程,2014,22(18):39-41. 被引量：3
6李隘优.自动连结链聚类算法[J].延边大学学报（自然科学版）,2015,41(3):254-256.
7李隘优.基于KCPSO算法对闽西地区崩塌地判释[J].江南大学学报（自然科学版）,2015,14(6):746-750. 被引量：1
8陈迪,周鸣争.物联网中基于跨层行为可信的访问控制机制[J].小型微型计算机系统,2016,37(9):2002-2006. 被引量：1
9朱林.基于改进的PEKS方案的高效搜索加密算法[J].河北工业科技,2016,33(6):470-473.
10寸江涛,高提雷.基于MPI的并行K-Means算法研究[J].保山学院学报,2016,35(5):77-80.

1General Contents of 2006 Issues[J].Journal of China University of Mining and Technology,2006,16(4).
2General Contents of 2007 Issues[J].Journal of China University of Mining and Technology,2007,17(4).
3General Contents of 2009 Issues[J].Mining Science and Technology,2009,19(6).
4郭育光,陆士良.THE RELATION BETWEEN THE CHAIN PILLAR WIDTH AND THE SURROUNDING ROCK DEFORMATION OF ROADWAY[J].Journal of China University of Mining and Technology,1992,30(1):1-10.
5CREO's Working Meeting[J].China Rare Earth Information,1997,3(4):1-2.
6HuYang TingYang.Outlier Mining Based on Principal Component Estimation[J].Acta Mathematicae Applicatae Sinica,2005,21(2):303-310.
7刘艳飞,聂庆民,艾光华.江西某铜硫矿选矿试验[J].现代矿业,2015,31(8):65-67. 被引量：2
8《International Journal of Mining Science and Technology》 Volume 22 General Contents of 2012 Issues[J].International Journal of Mining Science and Technology,2012,22(6):899-906. 被引量：1
9袁亮,吴侃,杜广森,谭志祥.Monitoring of Huaihe Dike Deformation Caused by Mining[J].Journal of China University of Mining and Technology,2001,11(1):14-19.
10Emery J.,Canbulat I.,Craig P.,Naylor J.,Sykes A..Development and implementation of spin to stall resin at Anglo Americans Australian underground coal operations[J].International Journal of Mining Science and Technology,2016,26(1):161-168.

Tsinghua Science and Technology

2005年第3期

浏览历史

内容加载中请稍等...

Improvement and Parallelism of k-Means Clustering Algorithm 被引量：2

同被引文献19

引证文献2

二级引证文献13

相关作者

相关机构

相关主题

浏览历史