期刊文献+

面向神威·太湖之光的国产异构众核处理器OpenCL编译系统 被引量:6

An OpenCL Compiler for the Homegrown Heterogeneous Many-Core Processor on the Sunway TaihuLight Supercomputer
下载PDF
导出
摘要 近年来硬件设计呈现出异构化的趋势,如何有效开发并行程序成为制约异构系统发展的瓶颈之一已成为业界共识.我国自主研制的"神威·太湖之光"超级计算机,采用了国产片上异构众核处理器SW26010,为了降低程序员的编程难度,同时提高软件的移植效率,作者设计并实现了支持国产SW26010众核处理器的OpenCL编译系统.该编译系统实现了OpenCL平台模型、内存模型和执行模型到SW26010众核处理器的映射与优化机制,同时生成性能良好的可执行文件.最后通过实验验证了该编译系统的正确性和有效性,典型OpenCL应用经该编译系统编译后,在中小输入规模下,性能显著优于Intel Xeon Phi,与NVIDIA GPU可比;在较大输入规模下,受限于局存SPM的容量限制,性能略低于NVIDIA GPU. In recent years,with the tremendous development of the integrated circuit technology,it is possible to integrate multiple processor cores on a single chip to accomplish more complex and large computational tasks,and the processor architecture has evolved from single-core to multi-core and many-core.However,there is also a bottleneck in improving performance by means of blindly increasing the cores of same type processors.To further enhance the computing power,there has been a trend towards heterogeneous system architecture,which can provide more powerful computing power and better performance-to-power ratio.It has become the industry consensus that the programming model is one of the bottlenecks restricting the development of heterogeneous systems.The Sunway TaihuLight supercomputer is the world’s first system with a peak performance greater than 100 PFlops,equipped with a homegrown heterogeneous many-core SW26010 CPU that includes both the management processing elements and computing processing elements in one chip.With 260 processing elements in one processor,a single SW26010 provides a peak performance of over 3 TFlops.On the other hand,large-scale scientific and engineering calculations such as earth,ocean,atmospheric system modeling and other critical applications are facing with the big performance challenge.How to fully utilize the computing power of the homegrown heterogeneous platform to achieve high performance of critical applications has important academic and practical value.In order to reduce the difficulty of programming,while improving software portability,we design and implement an OpenCL Compiler for the SW26010 processor.Based on the OpenCL programming framework and the microarchitecture of the homegrown many-core processors,the compiler provides the mapping mechanism from OpenCL platform,memory and execution model to the SW26010 many-core processor and implements thread coarsening,data layout and vectorization optimizations for the homegrown many-core processor.This paper implements the source-to-source compiler system based on Clang.The compiler translates OpenCL programs to the local low-level programming language for the processor,and calls the local compiler to generate high-performance binaries ultimately.The results of preliminary experiment have shown the correctness and effectiveness of our compiler.The performance of the typical OpenCL application which is compiled by our compilation system is significantly better than the Intel Xeon Phi and comparable with the NVIDIA GPU in the small and medium input size;it is slightly lower than the NVIDIA GPU in the large input size,for the limitations of SPM.
作者 伍明川 黄磊 刘颖 何先波 冯晓兵 WU Ming-Chuan;HUANG Lei;LIU Ying;HE Xian-Bo;FENG Xiao-Bing(State Key Laboratory of Computer Architecture,Institute of Computing Technology,Chinese Academy of Science,Beijing 100190;University of Chinese Academy of Sciences,Beijing 100049;Computer School,China West Normal University,Nanchong,Sichuan 637009)
出处 《计算机学报》 EI CSCD 北大核心 2018年第10期2236-2250,共15页 Chinese Journal of Computers
基金 国家重点研发计划项目高性能计算项目(2016YFB0200800) 国家自然科学基金重点项目(61432018) 创新研究群体项目(61521092) 南充市科技支撑项目(15A0068) 西华师范大学培育项目(13C002) 西华师范大学英才(17YC149)资助~~
关键词 OPENCL 异构 国产众核处理器 编译系统 OpenCL heterogeneous system homegrown many-core processor compilation system
  • 相关文献

参考文献8

二级参考文献86

共引文献41

同被引文献18

引证文献6

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部