期刊文献+

图计算加速架构综述 被引量:5

A Survey on Graph Processing Accelerators
下载PDF
导出
摘要 在大数据时代,图被用于各种领域表示具有复杂联系的数据.图计算应用被广泛用于各种领域,以挖掘图数据中潜在的价值.图计算应用特有的不规则执行行为,引发了不规则负载、密集读改写更新操作、不规则访存和不规则通信等挑战.现有通用架构无法有效地应对上述挑战.为了克服加速图计算应用面临的挑战,大量的图计算硬件加速架构设计被提出.它们为图计算应用定制了专用的计算流水线、访存子系统、存储子系统和通信子系统.得益于这些定制的硬件设计,图计算加速架构相比于传统的通用处理器架构,在性能和能效上均取得了显著的提升.为了让相关的研究学者深入了解图计算硬件加速架构,首先基于计算机的金字塔组织结构,从上到下对现有工作进行分类和总结,并以多个完整架构实例分析应用于不同层次的优化技术之间的关系.接着以图神经网络加速架构的具体案例讨论新兴图计算应用的加速架构设计.最后对该领域的前沿研究方向进行了总结,并放眼于未来探讨图计算加速架构的发展趋势. In the big data era,graphs are used as effective representations of data with the complex relationship in many scenarios.Graph processing applications are widely used in various fields to dig out the potential value of graph data.The irregular execution pattern of graph processing applications introduces irregular workload,intensive read-modify-write updates,irregular memory accesses,and irregular communications.Existing general architectures cannot effectively handle the above challenges.In order to overcome these challenges,a large number of graph processing accelerator designs have been proposed.They tailor the computation pipeline,memory subsystem,storage subsystem,and communication subsystem to the graph processing application.Thanks to these hardware customizations,graph processing accelerators have achieved significant improvements in performance and energy efficiency compared with the state-of-the-art software frameworks running on general architectures.In order to allow the related researchers to have a comprehensive understanding of the graph processing accelerator,this paper first classifies and summarizes customized designs of existing work based on the computer s pyramid organization structure from top to bottom.This article then discusses the accelerator design of the emerging graph processing application(i.e.,graph neural network)with specific graph neural network accelerator cases.In the end,this article discusses the future design trend of the graph processing accelerator.
作者 严明玉 李涵 邓磊 胡杏 叶笑春 张志敏 范东睿 谢源 Yan Mingyu;Li Han;Deng Lei;Hu Xing;Ye Xiaochun;Zhang Zhimin;Fan Dongrui;Xie Yuan(State Key Laboratory of Computer Architecture(Institute of Computing Technology,Chinese Academy of Sciences),Beijing 100190;University of Chinese Academy of Sciences,Beijing 100049;University of California at Santa Barbara,Santa Barbara,California,USA 93106)
出处 《计算机研究与发展》 EI CSCD 北大核心 2021年第4期862-887,共26页 Journal of Computer Research and Development
基金 国家重点研发计划项目(2018YFB1003501) 国家自然科学基金项目(61732018,61872335,61802367,61672499) 中国科学院战略性先导科技专项(C类)(XDC05000000) 数学工程与先进计算国家重点实验室开放基金(2019A07)。
关键词 图计算 图神经网络 加速架构 不规则访存 数据局部性 动态访存调度 负载均衡 graph processing graph neural network accelerator irregular memory access data locality dynamic data access scheduling workload balance
  • 相关文献

参考文献2

二级参考文献23

  • 1Graph500. Graph500 supercomputing sites [EB/OL]. [2013-11-10], http://www, graph500, org.
  • 2Beamer S, Asanovic K, Patterson D. Searching for a parent instead of fighting over children: A fast breadth-first search implementation for graph500, UCB/EECS-2011-117 [R]. Berkeley: University of California at Berkeley, 2011.
  • 3Beamer S, Asanovic K, Patterson D, Direction optimizing breadth-first search [C] //Proc of the 2012 Int Conf for High Performance Computing, Networking, Storage and Analysis. Amsterdam, Nethertands: IOSPress, 2012:137-148.
  • 4Beamer S, Buluc A, Asanovie K, et al. Distributed memory breadth-flrst search revisited: Enabling hottoraup search [EB/OL]. [2013-11-10]. http://www, eecs. berkeley, edu/ Pubs/ TechRpts/2013/EECS-2013-2. pdf.
  • 5Cong Guojing, Almasi G, Saraswat V. Fast PGAS implementation of distributed graph algorithms [C] //Proc of the 2010 ACM/IEEE Int Conf for High Performance Computing, Networking, Storage and Analysis. Los Alamitos, CA: IEEE Computer Society, 2010: 1-11.
  • 6Buluc A, Madduri K. Parallel breadth first search on distributed memory systems [C] //Proc of the 2011 Int Conf for High Performance Computing, Networking, Storage and Analysis. New York: ACM, 2011.
  • 7Agarwal V, Petrini F, Pasetto D, et al. Scalable graph exploration on multicore processors [C] //Proc of the 2010 ACM/IEEE Int Conf for High Performance Computing, Networking, Storage and Analysis. Los Alamitos, CA: IEEE Computer Society, 2010:1-11.
  • 8Leiserson C, Sehardl T. A work-efficient parallel breadth first search algorithm ( or how to cope with the nondeterminism of reducers) [C] //Proc of the 22nd Annual ACM Symp on Parallelism in Algorithms and Architectures. New York: ACM, 2010:303-314.
  • 9Xia Yinglong, Prasanna V. Topologically adaptive parallel hreadth-first search on multicore processors [C] //Proc of the 21st Int Conf on Parallel and Distributed Computing and Systems. Calgary, AB, Canada: ACTA, 2009.
  • 10Harish P, Narayanan P. Accelerating large graph algorithms on the GPU using CUDA [G] //LNCS 4873, Proc of the 14th Int Conf for High Performance Computing. Berlin: Springer, 2007:197-208.

共引文献16

同被引文献8

引证文献5

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部