期刊文献+

基于OpenCL的Kmeans算法的优化研究 被引量:4

Research on Kmeans Algorithm Optimization Based on OpenCL
下载PDF
导出
摘要 Kmeans算法是无监督机器学习中一种典型的聚类算法,是对已知数据集进行划分和分组的重要方法,在图像处理、数据挖掘、生物学领域有着广泛的应用。随着实际应用中数据规模的不断变大,对Kmeans算法的性能也提出了更高的要求。在充分考虑不同硬件平台体系架构差异的基础上,系统地研究了Kmeans算法在GPU和APU平台上实现与优化的关键技术:片上全局同步高效实现,冗余计算减少全局同步次数,线程任务重映射,局部内存重用等,实现了Kmeans算法在不同硬件平台上的高性能与性能移植。实验结果表明,优化后的算法在考虑数据传输时间的前提下,在AMD HD7970 GPU上相对于CPU版本取得136.975~170.333倍的加速比,在AMD A10-5800K APU上相对于CPU版本取得22.2365~24.3865倍的加速比,有效验证了优化方法的有效性和平台的可移植性。 As a typical clustering algorithm and an important method to data decomposition and packet processing, Kmeans algorithm is widely used in image processing, machine learning and biology, etc. Due to the constant expan-sion on data set, Kmeans is facing more and more demand on its performance. Having taken into full account the difference between hardware platforms and architectures, this paper conducts a systematic research on achieving Kmeans algorithm efficiently running on GPU and APU platforms based on OpenCL. And with the help of several optimization methods, such as the implementation of iterative algorithm with multiple global synchronization in GPU, the reduction on global synchronization by redundant computation, the redistribution on thread task, the reuseof local memory, etc, Kmeans algorithm achieves high efficient implementation on different hardware architectures and the optimization methods suitable for iterative algorithm are summed up. The experimental results show that the optimized algorithm gets 136.975~170.333 times speedup on AMD HD7970 GPU than the CPU version (with con-sidering the data transfer time) and gets 22.2365~24.3865 times speedup on AMD A10-5800K APU than the CPU version, which effectively verifies the validity and the cross-platform ability of the optimization methods proposed in this paper.
出处 《计算机科学与探索》 CSCD 2014年第10期1162-1176,共15页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金 Nos.61133005 61272136 国家高技术研究发展计划(863计划) No.2012AA010903 ISCAS-AMD联合Fusion软件中心资助项目~~
关键词 并行计算 迭代算法 跨平台 OpenCL Kmeans OpenCL parallel computing Kmeans iterative algorithm cross-platform
  • 相关文献

参考文献1

二级参考文献12

  • 1Jianbin Fang, Ana Lucia Varbanescu,Henk Sips. AComprehensive Performance Comparison of CUDA andOpneCL [C]. International Conference Parallel Processing,2011,216-225.
  • 2OpenCV Wiki. http://opencv.willowgarage.com/wiki/,2012.
  • 3Khronos OpenCL Working Group. The OpenCLSpecification Version: 1.2.
  • 4颜深根,张云泉,龙国平,李炎.基于OpenCL的归约算法优化.软件学报,2011, 22(2): 163-171.
  • 5Herve CHEVANNE Dr. Ing. AMD. A Methodology ForOptimizing Data Transfer in OpenCL. 2011.
  • 6AMD Accelerated Parallel Processing OpenCL, 2012.
  • 7Haipeng Jia, Yunquan Zhang, Guoping Long, JianliangXu, Shengen. GPURoofline: A Model for GuidingPerformance Optimizations on GPUs. In proceedingof International European Conference on Parallel andDistributed Computing (EURO-PAR). Rhodes Island,Greece, 2012.
  • 8Haipeng Jia, Yunquan Zhang, Shengen Yan. An InsightfulProgram Performance Tuning Chain for GPU Computing.In proceeding of the 12th International Conference onAlgorithms and Architectures for Parallel Processing(ICA3PP-12). Fukuoka, Japan, 2012.
  • 9Using OpenCL Image2D Variables. http://www.cmsoft.com.br/, 2012.
  • 10Haipng Jia, Yunquan Zhang, Weiyan Wang, JianliangXu. Accelerating Viola-Jones Face DetectionAlgorithm on GPUs. In proceeding of the 14th IEEEInternational Conference on High PerformanceComputing and Communications (HPCC-2012).Livepool, UK, 2012.

共引文献2

同被引文献14

引证文献4

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部