期刊文献+

基于k-d树分区的聚类算法并行加速策略 被引量:3

Parallel acceleration strategy of clustering algorithm based on k-d tree partition
下载PDF
导出
摘要 针对传统K-Means算法存在准确率低、聚类速度慢的问题,从K-Means算法优化和Flink框架并行层面对K-Means算法优化。为避免算法陷入局部最优解,采用质心间最大距离原则选出k个质心;为提高大数据量下的K-Means聚类速度,提出用k-d树算法划分数据集实现操作算子并行化,设置多个TaskManager数目和CPU核数加速F-KMeans算法的执行。实验结果表明,较K-Means算法,F-KMeans算法的准确率提高了约3.6%;F-KMeans算法在DataSource耗时降低了45.45%,在其余阶段耗时平均降低了约28.57%。 In view of the low accuracy and clustering speed of traditional K-Means algorithm,K-Means algorithm was optimized from the perspectives of optimization of K-Means algorithm and the parallel level of Flink framework.To avoid the algorithm falling into the local optimal solution,K centroids were selected based on the principle of maximum distance between centroids.To improve the speed of K-Means clustering in large amount of data,k-d tree algorithm was proposed to divide data sets to realize the parallel operation of operators,and the number of task managers and CPU cores were set to accelerate the implementation of F-Kmeans algorithm.Compared with K-Means algorithm,the accuracy of F-Kmeans algorithm is improved by about3.6%,the time consumption of F-Kmeans algorithm in datasource is reduced by 45.45%,and the time consumption in other stages is reduced by about 28.57%.
作者 汪丽娟 钱育蓉 侯海耀 张晗 赵京霞 赵燚 WANG Li-juan;QIAN Yu-rong;HOU Hai-yao;ZHANG Han;ZHAO Jing-xia;ZHAO Yi(Software College,Xinjiang University,Urumqi 830008,China;College of Information Science and Engineering,Xinjiang University,Urumqi 830046,China)
出处 《计算机工程与设计》 北大核心 2019年第12期3437-3442,共6页 Computer Engineering and Design
基金 国家自然科学基金项目(61562086、61462079) 新疆“万人计划”后备基金项目(wr2015bj01) 新疆维吾尔自治区高校科研基金项目(XJEDU2017002)
关键词 数据分区 加速策略 性能优化 并行化 流式计算 data partitioning acceleration strategy performance optimization parallelization stream computing
  • 相关文献

参考文献5

二级参考文献31

共引文献55

同被引文献30

引证文献3

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部