摘要
介绍针对异构集群体系结构特点设计的编程接口Parray.Parray使用数组类型对数据的物理存储和逻辑结构进行分离.Parray使用统一的线程数组类型表示各种进程(线程)的创建以及它们之间的控制流转.通过矩阵乘法实例演示Parray程序设计的特点:该程序由一个单CPU线程程序演变为多CPU线程程序、再演变为GPU线程程序——程序的各次演变仅通过数组类型的变化和代码的细微修改即可完成.介绍使用Parray实现的高性能GPU矩阵乘法,在天河1A单节点上的测试性能和CUBLAS 4.0相当,同时该代码可以工作于不同物理存储方式的数组.
In this paper, a programming interface of GPU-accelerated heterogeneous clusters named Parray is introduced. In Parray, the concept of array type is involved to separate the physical data layout and logical structure of multi-dimensional data~ the control flow diversion of heteroge- neous computation units is formally unified. An example code of matrix multiplication is shown to demonstrate the programming characteristics of Parray. the code envolves from a single CPU- thread code to multi-threads code and then a GPU code by modifying the array types and several program lines. A GPU-based high performance GEMM implemented in Parray is introduced and achieves almost the same Gflops when testing on a single node of Tian-lA system. Because the code operates directly on the logical structure of data, the same GEMM code can work on different physical array data layouts.
出处
《计算机学报》
EI
CSCD
北大核心
2014年第12期2564-2573,共10页
Chinese Journal of Computers
基金
国家"八六三"高技术研究发展计划项目基金(2012AA010902
2012AA010903)
国家自然科学基金(61240045
61170053
61432018
61379048)
博士后科学基金(2013M540821)
河南省教育厅科学技术研究重点项目(13A520065)资助~~
关键词
GPU集群
程序设计
矩阵乘法
编程接口
性能优化
GPU-accelerated cluster
programming method
matrix multiplication
programminginterface
performance optimization