期刊文献+

面向大规模异构计算平台的MiniGo高效训练方法

High efficient training method of MiniGo on large-scaleheterogeneous computing platform
下载PDF
导出
摘要 提出一种适用于大规模异构计算平台训练MiniGo智能体的高效多级并行训练方法,包括节点间任务级并行、中央处理器-数字信号处理器(central processing unit-digital signal processor, CPU-DSP)异构并行、DSP核内并行。实现了高效的输入/输出部署,消除网络通信瓶颈。提出了面向CPU-DSP共享内存结构的异构计算内存管理,减少异构设备间的数据搬运。实现了共享内存编程优化,并利用DSP实现密集卷积计算算子加速优化。结果表明,与16核CPU计算相比,单核DSP算子加速最大加速比达16.44;该方法实现计算节点规模从1 067扩展至4 139,得到达到给定终止条件所需时间从43.02 h降至16.05 h,可扩展效率为69.1%。评估表明,该方法能够实现MiniGo在大规模异构计算平台的高效并行训练。 An efficient multi-level parallel training method suitable for training MiniGo agents on large-scale heterogeneous computing platforms was proposed,including task level parallelism between nodes,CPU-DSP(central processing unit-digital signal process)heterogeneous parallelism and DSP core parallelism.Efficient input/output deployment and eliminated the bottleneck of network communication were realized.A heterogeneous computing memory management oriented to CPU-DSP shared memory structure was proposed to reduce the data handling between heterogeneous devices.Shared memory programming optimization was realized,and the dense convolution calculation operator acceleration optimization was realized by DSP.Results show that compared with 16 core CPU calculation,the maximum acceleration ratio of single core DSP operator acceleration is 16.44.In this method,the scale of computing nodes is expanded from 1067 to 4139,the time required to reach the given termination condition is reduced from 43.02 h to 16.05 h,and the expansion efficiency is 69.1%.Evaluation shows that this method can realize the efficient parallel training of MiniGo on large-scale heterogeneous computing platforms.
作者 李荣春 贺周雨 乔鹏 姜晶菲 窦勇 李东升 LI Rongchun;HE Zhouyu;QIAO Peng;JIANG Jingfei;DOU Yong;LI Dongsheng(National Key Laboratory of Parallel and Distributed Computing,National University of Defense Technology,Changsha 410073,China)
出处 《国防科技大学学报》 EI CAS CSCD 北大核心 2024年第5期209-218,共10页 Journal of National University of Defense Technology
基金 国家自然科学基金资助项目(61902415)。
关键词 MiniGo 大规模异构计算平台 数字信号处理器 MiniGo large-scale heterogeneous computing platform DSP
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部