摘要
针对嵌入式GPU通用计算的仿真器构建需求,通过对通用图形处理单元仿真器(general purpose graphics processing unit-simulator,GPGPU-sim)的计算核心、存储结构与Mali GPU的异同进行比较分析,首先建立面向OpenCL的Mali GPU仿真器的流程与结构,并设计计算单元数、寄存器数、最小并行粒度等GPU微体系结构参数的获取方法,在对GPGPU-sim进行修改和配置后,实现了对特定GPU架构的仿真器构建。使用矩阵相乘、图像处理等OpenCL程序对仿真器的准确性进行测试,以程序在仿真器和硬件平台上的执行周期数差距作为评估依据。实验结果表明:对于测试程序集中优化前的OpenCL程序,其中70%的程序在两个平台上的运行周期数差距不超过30%;对于优化后的OpenCL程序,其中90%的程序的运行周期数差距不超过30%。由此证明,构建的GPU仿真器能够满足OpenCL程序的仿真与性能评估。
The similarities and differences between GPGPU-sim and Mall GPU in computing cores and the storage structure are analyzed and compared, and simulating procedures and structures of Mali GPUs for OpenCL are built up to develop simulators for the general-purpose computing on embedded GPU. Methods to obtain the GPU microarchitecture parameters such as the computing unit number, the number of registers and the minimum parallel granularity are designed, and then the GPGPU-sim is configured and modified to construct specific GPU simulators. The accuracy of the simulator is tested through comparisons of running OpenCL programs, such as matrix multiplication and image processing on a real GPU and the simulator, and the difference between running cycles on the real GPU and the simulator is used as evaluation. Results show that the cycle differences are within 30% for about 70% OpenCL programs with simple implementation, and the cycle differences are within 30% for about 90% OpenCL programs with optimization. Therefore, it can be concluded that the constructed simulator meets the requirements of simulating and evaluating OpenCL programs on the embedded GPU.
出处
《西安交通大学学报》
EI
CAS
CSCD
北大核心
2015年第2期20-24,68,共6页
Journal of Xi'an Jiaotong University
基金
国家高技术研究发展计划资助项目(2012AA010904)
国家自然科学基金资助项目(61375023)