面向数据规模可扩展的并行优化K-means算法

Parallel Optimization K-means Algorithm Facing the Data Size Scalable

下载PDF

导出

摘要传统的K-means算法迭代过程中需要加载全部的聚类样本数据,并且更新类中心过程是非并行的。针对传统Kmeans算法处理数据规模小和类中心更新慢的问题,提出一种改进的K-means算法,面向解决K-means单台机器处理数据规模扩展问题,和处理器利用率低效问题。实验验证,该方法能够高效地处理大规模数据聚类。 Traditional K-means algorithm need to load all the sample data into memory, and updating the class center is a non-parallel process. For the problem of the number of processing data is small and updating class centers with low speed in traditional K-means algorithm, proposes an improved K-means algorithm to solve the problems of processing data scale expansion and the processor utilization inefficient. Experiment shows the method can efficiently deal with large-scale data clustering.

作者李尧坤

机构地区四川大学计算机学院

出处《现代计算机（中旬刊）》 2015年第1期3-5,共3页 Modern Computer

关键词 K—means 大规模更新类中心并行 K-means Large-Scale, Updating Class Centers Parallel

分类号 TP273 [自动化与计算机技术—检测技术与自动化装置]

引文网络
相关文献

参考文献2

1王千,王成,冯振元,叶金凤.K-means聚类算法研究综述[J].电子设计工程,2012,20(7):21-24. 被引量：308
2周丽娟,王慧,王文伯,张宁.面向海量数据的并行KMeans算法[J].华中科技大学学报（自然科学版）,2012,40(S1):150-152. 被引量：32

二级参考文献22

1Ekanayake J,Pallickara S.MapReduce for data in-tensive scientific analysis. IEEE eScience . 2008
2Zhou Ping,Lei Jingsheng,Ye Wenjun.Large-scaledata sets clustering based on MapReduce and hadoop. Journal of Computational Information Systems . 2011
3Hadoop:Open source implementation of MapReduce. http:∥hadoop.apache.org . 2010
4Wang Xuan.Clustering in the cloud:clustering algo-rithms to Hadoop Map/Reduce framework. . 2010
5DEAN J,GHEMAWAT S.MapReduce:simplified data processing on large clusters. . 2004
6Ekanayake J,Pallickara S.MapReduce for data in-tensive scientific analysis. IEEE eScience . 2008
7Zhou Ping,Lei Jingsheng,Ye Wenjun.Large-scaledata sets clustering based on MapReduce and hadoop. Journal of Computational Information Systems . 2011
8Hadoop:Open source implementation of MapReduce. http:∥hadoop.apache.org . 2010
9Wang Xuan.Clustering in the cloud:clustering algo-rithms to Hadoop Map/Reduce framework. . 2010
10DEAN J,GHEMAWAT S.MapReduce:simplified data processing on large clusters. . 2004

共引文献337

1马燕,余海军,钟发生,刘丰林.基于残差编解码网络的CT图像金属伪影校正[J].仪器仪表学报,2020,41(8):160-169. 被引量：18
2谢皓,孙小东,何海熙.基于K-means聚类的高炉操作炉型研究[J].冶金自动化,2023,47(S01):88-91. 被引量：1
3高显义,林欣晖.基于文本聚类的变电工程变更特征识别研究[J].建筑经济,2020,41(S02):200-203. 被引量：2
4赵源,王越,胡华.基于POI-K-means地铁车站聚类方法研究[J].智能计算机与应用,2022,12(5):114-118. 被引量：7
5Kui Luo,Wenhui Shi,Weisheng Wang.Extreme scenario extraction of a grid with large scale wind power integration by combined entropy-weighted clustering method[J].Global Energy Interconnection,2020,3(2):140-148. 被引量：10
6郑攀,庹武.基于K-means聚类算法的女裤弹性面料分类研究[J].国际纺织导报,2014,42(5):71-72. 被引量：1
7单冬红,李玮瑶.基于约束性过滤的改进K均值挖掘算法研究[J].科技通报,2013,29(4):171-173. 被引量：4
8刘寒梅,张鹏.基于模拟退火算法对K-means聚类算法的优化[J].中国西部科技,2013,12(6):23-24. 被引量：2
9李欢,廖利.基于模糊能量自学习的汽车发动机传感器节点故障诊断方法研究[J].科技通报,2013,29(6):86-88. 被引量：1
10余文礼.基于聚类分析和贪心算法的文件碎片拼接复原[J].电子世界,2014(11):183-183.

1邵伟民.廉价构筑您的超级计算机环境(松散集群)[J].胜利油田职工大学学报,2003(4):39-41.
2陈森利,吴福疆,林洪浩,李楠.电力计量采集系统中分布式缓存系统研究[J].信息技术,2014,38(7):70-73. 被引量：3
3马增辉,解建仓,王少波.基于J2EE技术的工作流管理系统的研究与实现[J].计算机工程与应用,2007,43(3):207-210. 被引量：4
4孙亚忠,盛步云.基于面向对象与Web技术的工作流管理系统研究[J].武汉理工大学学报（信息与管理工程版）,2004,26(5):67-70.
5仇阳.Linux内核进程调度算法发展[J].电子世界,2017,0(7):85-85. 被引量：5
6王荣生,杨际祥,王凡.负载均衡策略研究综述[J].小型微型计算机系统,2010,31(8):1681-1686. 被引量：21
7史美林,向勇.具有层次结构且规模可扩展的多目标路由算法[J].通信学报,1999,20(S1):134-142. 被引量：2
8文远保,杨霞.基于ATM的多媒体服务器初探[J].计算机工程与设计,1999,20(1):11-16.
9关于DELMIA[J].舰船科学技术,2007,29(2):13-13.
10薛静.工作流管理系统的研究与实现[J].成功,2008(6):223-224.

现代计算机（中旬刊）

2015年第1期

浏览历史

内容加载中请稍等...

面向数据规模可扩展的并行优化K-means算法

参考文献2

二级参考文献22

共引文献337

相关作者

相关机构

相关主题

浏览历史