期刊文献+

面向高通量计算机的图算法优化技术 被引量:9

Efficient Optimization of Graph Computing on High-Throughput Computer
下载PDF
导出
摘要 随着互联网技术的蓬勃发展,图数据的规模呈爆炸式增长.如何高效地处理大规模图数据逐渐成为工业界和学术界关注的焦点.宽度优先搜索算法是解决图遍历问题的经典算法,也是Graph500基准的核心测试程序之一.高通量计算机采用ARM架构的众核体系结构,具有高并发、强实时、低功耗等适于大数据计算的特点.在单节点上,BFS算法的优化已取得一系列进展,首先对现有的优化技术进行系统的介绍,并在此基础上提出2种面向高通量计算机的优化手段,通过减少冗余访存和提高缓存局部性,有效提高了算法的访存效率.通过这些优化手段,在高通量计算机上对BFS算法的性能进行了系统的评估.对于顶点规模为230的Kronecker图(顶点数为230,边数为234),优化后的BFS算法在高通量计算机上的平均性能为24.26 GTEPS.与两路x86架构服务器相比,单节点具有1.18倍的性能优势.在性能功耗比方面,高通量计算机的结果为181.04 MTEPS W.在2019年6月份的Green Graph500面向大数据集的排行榜上取得第2名的成绩.综上,高通量计算机的高并发和低功耗等特点非常适合处理大规模图计算等数据密集型应用. With the rapid development of computing technology,the scale of graph increases explosively and large-scale graph computing has been the focus in recent years.Breadth first search(BFS)is a classic algorithm to solve graph traverse problem.It is the main kernel of Graph500 benchmark that evaluates the performance of supercomputers and servers in terms of data-intensive applications.High-throughput computer(HTC)adopts ARM-based many-core architecture,which has the characteristics of high concurrency,strong real-time,low-power consumption.The optimization of BFS algorithm has made a series of progress on single-node systems.In this paper,we first introduce parallel BFS algorithm and existing optimizations.Then we propose two optimization techniques for HTC to improve the efficiency of data access and data locality.We systematically evaluate the performance of BFS algorithm on HTC.For the Kronecker graph with 2 scale=230 whose vertices are 230 and edges are 234,the average performance on HTC is 24.26 GTEPS and 1.18 times faster than the two-way x86 server.In terms of energy efficiency,the result on HTC is 181.04 MTEPS W and rank 2nd place on the June 2019 Green Graph500 big data list.To our best knowledge,this is the first work that evaluates BFS performance on HTC platform.HTC is suitable for data intensive applications such as large-scale graph computing.
作者 张承龙 曹华伟 王国波 郝沁汾 张洋 叶笑春 范东睿 Zhang Chenglong;Cao Huawei;Wang Guobo;Hao Qinfen;Zhang Yang;Ye Xiaochun;Fan Dongrui(State Key Laboratory of Computer Architecture(Institute of Computing Technology,Chinese Academy of Sciences),Beijing 100190;School of Computer and Control Engineering,University of Chinese Academy of Sciences,Beijing 100049)
出处 《计算机研究与发展》 EI CSCD 北大核心 2020年第6期1152-1163,共12页 Journal of Computer Research and Development
基金 国家重点研发计划项目(2018YFB1003501) 国家自然科学基金项目(11904370,61732018,61672499) 计算机体系结构国家重点实验室创新项目(CARCH4509)。
关键词 宽度优先搜索 高通量 Graph500 图算法 超算 breadth first search(BFS) high throughput Graph500 graph algorithm super computing
  • 相关文献

参考文献2

二级参考文献23

  • 1Graph500. Graph500 supercomputing sites [EB/OL]. [2013-11-10], http://www, graph500, org.
  • 2Beamer S, Asanovic K, Patterson D. Searching for a parent instead of fighting over children: A fast breadth-first search implementation for graph500, UCB/EECS-2011-117 [R]. Berkeley: University of California at Berkeley, 2011.
  • 3Beamer S, Asanovic K, Patterson D, Direction optimizing breadth-first search [C] //Proc of the 2012 Int Conf for High Performance Computing, Networking, Storage and Analysis. Amsterdam, Nethertands: IOSPress, 2012:137-148.
  • 4Beamer S, Buluc A, Asanovie K, et al. Distributed memory breadth-flrst search revisited: Enabling hottoraup search [EB/OL]. [2013-11-10]. http://www, eecs. berkeley, edu/ Pubs/ TechRpts/2013/EECS-2013-2. pdf.
  • 5Cong Guojing, Almasi G, Saraswat V. Fast PGAS implementation of distributed graph algorithms [C] //Proc of the 2010 ACM/IEEE Int Conf for High Performance Computing, Networking, Storage and Analysis. Los Alamitos, CA: IEEE Computer Society, 2010: 1-11.
  • 6Buluc A, Madduri K. Parallel breadth first search on distributed memory systems [C] //Proc of the 2011 Int Conf for High Performance Computing, Networking, Storage and Analysis. New York: ACM, 2011.
  • 7Agarwal V, Petrini F, Pasetto D, et al. Scalable graph exploration on multicore processors [C] //Proc of the 2010 ACM/IEEE Int Conf for High Performance Computing, Networking, Storage and Analysis. Los Alamitos, CA: IEEE Computer Society, 2010:1-11.
  • 8Leiserson C, Sehardl T. A work-efficient parallel breadth first search algorithm ( or how to cope with the nondeterminism of reducers) [C] //Proc of the 22nd Annual ACM Symp on Parallelism in Algorithms and Architectures. New York: ACM, 2010:303-314.
  • 9Xia Yinglong, Prasanna V. Topologically adaptive parallel hreadth-first search on multicore processors [C] //Proc of the 21st Int Conf on Parallel and Distributed Computing and Systems. Calgary, AB, Canada: ACTA, 2009.
  • 10Harish P, Narayanan P. Accelerating large graph algorithms on the GPU using CUDA [G] //LNCS 4873, Proc of the 14th Int Conf for High Performance Computing. Berlin: Springer, 2007:197-208.

共引文献9

同被引文献47

引证文献9

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部