基于Parray数组类型的矩阵乘法实现被引量：1

Matrix Multiplication Implementation Based on Array Types of Parray

下载PDF

导出

摘要介绍针对异构集群体系结构特点设计的编程接口Parray.Parray使用数组类型对数据的物理存储和逻辑结构进行分离.Parray使用统一的线程数组类型表示各种进程(线程)的创建以及它们之间的控制流转.通过矩阵乘法实例演示Parray程序设计的特点:该程序由一个单CPU线程程序演变为多CPU线程程序、再演变为GPU线程程序——程序的各次演变仅通过数组类型的变化和代码的细微修改即可完成.介绍使用Parray实现的高性能GPU矩阵乘法,在天河1A单节点上的测试性能和CUBLAS 4.0相当,同时该代码可以工作于不同物理存储方式的数组. In this paper, a programming interface of GPU-accelerated heterogeneous clusters named Parray is introduced. In Parray, the concept of array type is involved to separate the physical data layout and logical structure of multi-dimensional data~ the control flow diversion of heteroge- neous computation units is formally unified. An example code of matrix multiplication is shown to demonstrate the programming characteristics of Parray. the code envolves from a single CPU- thread code to multi-threads code and then a GPU code by modifying the array types and several program lines. A GPU-based high performance GEMM implemented in Parray is introduced and achieves almost the same Gflops when testing on a single node of Tian-lA system. Because the code operates directly on the logical structure of data, the same GEMM code can work on different physical array data layouts.

作者崔翔李晓雯陈一峯

机构地区北京大学信息科学技术学院高可信教育部重点实验室河南大学计算机与信息工程学院

出处《计算机学报》 EI CSCD 北大核心 2014年第12期2564-2573,共10页 Chinese Journal of Computers

基金国家"八六三"高技术研究发展计划项目基金(2012AA010902 2012AA010903) 国家自然科学基金(61240045 61170053 61432018 61379048) 博士后科学基金(2013M540821) 河南省教育厅科学技术研究重点项目(13A520065)资助~~

关键词 GPU集群程序设计矩阵乘法编程接口性能优化 GPU-accelerated cluster programming method matrix multiplication programminginterface performance optimization

分类号 TP312 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献8

1Cui Xiang, Chen Yifeng, Mei Hong. Improving performance of matrix multiplication and FFT on GPU//Proceedings of the 15th International Conference on Parallel and Distributed Systems. Shenzhen, China, 2009:42-48.
2Cui Xiang, Chen Yifeng, Zhang Changyou, Mei Hong. Auto- tuning dense matrix multiplication for GPGPU with cache// Proceedings of the 16th International Conference on Parallel and Distributed Systems. Shanghai, China, 2010: 237-242.
3Chen Yifeng, Cui Xiang, Mei Hong. Large-scale FFT on GPU clusters//Proceedings of the 24th International Confer- ence on Supercomputing. Tsukuba, Ibaraki, Japan, 2010: 315-324.
4陈一睾,崔翔,梅宏.众核加速的工作站集群软件问题初探//全国高性能计算学术年会.长沙,中国,2009:45-50.
5陈一睾,崔翔,梅宏.PARRAY:一个针对GPU集群的统一编程工具//全国高性能计算学术年会.北京,中国,2010:45-50.
6Chen Yifeng, Cui Xiang, Mei Hong. PARRAY: A unifying array representation for heterogeneous parallelism//Proeeedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. New Orleans, USA, 20121 171-180.
7Nvidia. CUDA Compute Unified Cevice Architecture Programming Guide. New Orleans, USA: NVIDIA Corp, 2007.
8Volkov V, Demmel J W. Benchmarking GPUs to tune dense linear algebra//Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Salt Lake City, USA, 2008:1-11.

同被引文献7

1张梦元.基于CUDA的矩阵乘法的并行实现[J].信息通信,2012,25(2):20-21. 被引量：2
2刘沛华,鲁华祥,龚国良,刘文鹏.基于FPGA的全流水双精度浮点矩阵乘法器设计[J].智能系统学报,2012,7(4):302-306. 被引量：8
3王云龙,吴瑛.基于GPU的相关干涉仪算法实现[J].信息工程大学学报,2015,16(1):41-45. 被引量：4
4周磊涛,陶耀东,刘生,李锁.基于FPGA的Systolic乘法技术研究[J].计算机工程与科学,2015,37(9):1632-1636. 被引量：6
5朱敏,唐波,赵娟,邹丹,李金才.布尔矩阵乘的分布式异构并行优化[J].计算机工程与科学,2017,39(4):634-640. 被引量：1
6龙卓群,王晓瑜,王昌明.基于DCT预测编码的Epiphany-OpenCL大矩阵乘并行计算[J].自动化与仪表,2017,32(7):16-21. 被引量：3
7刘鹏,王学奎,黄宜华,孟磊,丁恩杰.基于Spark的极限学习机算法并行化研究[J].计算机科学,2017,44(12):33-37. 被引量：6

引证文献1

1肖汉,肖诗洋,李彩林,周清雷.异构平台上基于OpenCL的矩阵乘并行算法[J].西南大学学报（自然科学版）,2020,42(11):147-153. 被引量：3

二级引证文献3

1黄敬频,白瑞,徐云,赵耿威.四元数矩阵的直积分解及最佳逼近[J].西南师范大学学报（自然科学版）,2022,47(2):1-6. 被引量：1
2孙祥杰,朱亮,余同欢.基于OpenCL的SAR影像快速浏览方法研究[J].电子质量,2023(3):24-30.
3王文善,张维忠,李强.基于OpenCL的腐蚀膨胀算法的并行优化[J].青岛大学学报（工程技术版）,2023,38(4):22-26.

1Knuth,DE,邵惠玲.TEX的错误[J].软件,1990,11(1):1-81.
2欧劲昭,黄娟,尹俊勋,叶梧.ActiveX控件的应用及其关键技术[J].计算机应用研究,1999,16(12):50-52. 被引量：4
3潘日明.C程序设计的数组探讨[J].中国科技信息,2009(2):76-76. 被引量：1
4丁蕙.计算机教育中的C程序设计的数组探讨[J].福建电脑,2008,24(10):208-209.
5武丽娟.论《C语言程序设计》中的数组[J].福建电脑,2010,26(7):204-205. 被引量：1
6马金鑫,李舟军,忽朝俭,张俊贤,郭涛.二进制代码中数组类型抽象的重构方法[J].清华大学学报（自然科学版）,2012,52(10):1329-1334. 被引量：1
7夏传良.B-shell功能扩展[J].计算机应用,2005,25(5):1042-1044.
8陈凯明,刘宗田.非纯变量下标的数组类型恢复[J].计算机工程与设计,2001,22(5):86-89.
9马金鑫,李舟军,忽朝俭,张俊贤,郭涛.一种重构二进制代码中类型抽象的方法[J].计算机研究与发展,2013,50(11):2418-2428. 被引量：8
10祝晓鹰,余锋.如何在OLE自动化组件中传递数组[J].电脑编程技巧与维护,2001(9):53-54. 被引量：1

计算机学报

2014年第12期

浏览历史

内容加载中请稍等...

基于Parray数组类型的矩阵乘法实现被引量：1

参考文献8

同被引文献7

引证文献1

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

基于Parray数组类型的矩阵乘法实现 被引量：1

参考文献8

同被引文献7

引证文献1

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

基于Parray数组类型的矩阵乘法实现被引量：1