期刊文献+

FPGA图计算的编程与开发环境:综述和探索 被引量:2

Programming and Developing Environment for FPGA Graph Processing:Survey and Exploration
下载PDF
导出
摘要 基于新型可重构架构FPGA(field programmable gate array)的图计算加速器同时具备着性能和能效的优势,满足复杂性高、数据规模大和基本操作多变的图计算的性能需求.但高效底层硬件代码的设计需要很长的设计周期,而已有的通用编程与开发环境虽满足功能要求,但性能差距较大.因此,编程墙的问题是影响应用开发与加速器性能的重要阻碍之一.设计良好的编程与开发环境是图计算加速器进一步提升性能且降低开发周期的最重要环节.高效的编程与开发环境需要提供便利的应用程序接口、扩展性强的编程模型、高效的高层次综合工具、能够融合软硬件特性的领域特定语言以及生成高性能硬件代码.对FPGA图计算的编程与开发环境做出系统性探索,主要就编程模型、高层次综合、编程语言以及应用程序开发进行介绍与分析.此外还对国内外相关技术的发展进行总结与分析,并针对本领域相关开放问题与挑战提供了未来思考. Due to the advantages of high performance and efficiency,graph processing accelerators based on reconfigurable architecture field programmable gate array(FPGA)have attracted much attention,which satisfy complex graph applications with various basic operations and large-scale of graph data.However,efficient code design for FPGA takes long time,while the existing functional programming environment cannot achieve desirable performance.Thus,the problem of programming wall on FPGA is significant,and has become a serious obstacle when designing the dedicated accelerators.A well-designed programming environment is necessary for the further popularity of FPGA-based graph processing accelerators.A well-designed programming environment calls for convenient application programming interfaces,scalable application programming models,efficient high-level synthesis tools,and a domain-specific language that can integrate software hardware features and generate high-performance underlying code.In this article,we make a systematic exploration of the programming environment for FPGA graph processing.We mainly introduce and analyze programming models,high-level synthesis,programming languages,and the related hardware frameworks.In addition,we also introduce the domestic and foreign development of FPGA-based graph processing accelerators.Finally,we discuss the open issues and challenges in this specific area.
作者 郭进阳 邵传明 王靖 李超 朱浩瑾 过敏意 Guo Jinyang;Shao Chuanming;Wang Jing;Li Chao;Zhu Haojin;Guo Minyi(School of Electronic Information and Electrical Engineering,Shanghai Jiao Tong University,Shanghai 200240)
出处 《计算机研究与发展》 EI CSCD 北大核心 2020年第6期1164-1178,共15页 Journal of Computer Research and Development
基金 国家重点研发计划项目(2018YFB1003500)。
关键词 现场可编程门阵列 图计算 硬件加速器 编程与开发环境 编程模型 高层次综合 领域特定语言 应用程序接口 field programmable gate array(FPGA) graph processing hardware accelerator programming environment programming model high-level synthesis domain-specific language application programming interface(API)
  • 相关文献

参考文献2

二级参考文献24

  • 1Graph500. Graph500 supercomputing sites [EB/OL]. [2013-11-10], http://www, graph500, org.
  • 2Beamer S, Asanovic K, Patterson D. Searching for a parent instead of fighting over children: A fast breadth-first search implementation for graph500, UCB/EECS-2011-117 [R]. Berkeley: University of California at Berkeley, 2011.
  • 3Beamer S, Asanovic K, Patterson D, Direction optimizing breadth-first search [C] //Proc of the 2012 Int Conf for High Performance Computing, Networking, Storage and Analysis. Amsterdam, Nethertands: IOSPress, 2012:137-148.
  • 4Beamer S, Buluc A, Asanovie K, et al. Distributed memory breadth-flrst search revisited: Enabling hottoraup search [EB/OL]. [2013-11-10]. http://www, eecs. berkeley, edu/ Pubs/ TechRpts/2013/EECS-2013-2. pdf.
  • 5Cong Guojing, Almasi G, Saraswat V. Fast PGAS implementation of distributed graph algorithms [C] //Proc of the 2010 ACM/IEEE Int Conf for High Performance Computing, Networking, Storage and Analysis. Los Alamitos, CA: IEEE Computer Society, 2010: 1-11.
  • 6Buluc A, Madduri K. Parallel breadth first search on distributed memory systems [C] //Proc of the 2011 Int Conf for High Performance Computing, Networking, Storage and Analysis. New York: ACM, 2011.
  • 7Agarwal V, Petrini F, Pasetto D, et al. Scalable graph exploration on multicore processors [C] //Proc of the 2010 ACM/IEEE Int Conf for High Performance Computing, Networking, Storage and Analysis. Los Alamitos, CA: IEEE Computer Society, 2010:1-11.
  • 8Leiserson C, Sehardl T. A work-efficient parallel breadth first search algorithm ( or how to cope with the nondeterminism of reducers) [C] //Proc of the 22nd Annual ACM Symp on Parallelism in Algorithms and Architectures. New York: ACM, 2010:303-314.
  • 9Xia Yinglong, Prasanna V. Topologically adaptive parallel hreadth-first search on multicore processors [C] //Proc of the 21st Int Conf on Parallel and Distributed Computing and Systems. Calgary, AB, Canada: ACTA, 2009.
  • 10Harish P, Narayanan P. Accelerating large graph algorithms on the GPU using CUDA [G] //LNCS 4873, Proc of the 14th Int Conf for High Performance Computing. Berlin: Springer, 2007:197-208.

共引文献13

同被引文献8

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部