基于软硬件协同加速的关系网络推理优化方法

Relation network inference optimization method based on software and hardware co-acceleration

下载PDF

导出

摘要针对数据中心基于图形处理器(GPU)平台的关系网络推理计算中存在的低效能问题,本文提出了一种基于软硬件协同加速的关系网络优化方法。该方法采用基于GPU提取的支持集特征池与现场可编程门阵列(FPGA)推理异构协同的方式处理关系网络的推理计算,在高效能计算的同时保持关系网络的推理计算与GPU平台一致的准确率。利用基于高级综合(HLS)优化浮点卷积神经网络的计算方式,提高关系网络的处理能效。利用多运算单元异构多核处理的方式,满足FPGA时序收敛的同时,提升FPGA片上吞吐能力。本文在FPGA平台上实现了关系网络推理运算单元,在Omniglot数据集上构建的加速器功耗为15.867W,相对于GPU加速比为1.4~17.2;在miniImageNet数据集上构建的加速器功耗为12.359W,相对于GPU加速比为1.5~3.4。本文方法与同类FPGA加速浮点卷积神经网络相比,达到了最优的计算效能。实验数据表明,该方法有效利用了软硬件协同计算以及FPGA可重构计算的优势,降低了软硬件协同开发的耦合度,在保持关系网络推理计算准确率的同时,提升了关系网络推理的计算效能。 Aiming at the problem of low efficiency in relation network inference computing based on graphics processing unit(GPU)platform,this paper proposes a relation network optimization method based on software and hardware co-acceleration.In this method,the inference calculation of the relation network is processed by means of heterogeneous collaboration between the feature pool of support set extracted by GPU and the inference of field programmable gate array(FPGA).The inference calculation of the relation network and the GPU platform are maintained with the same accuracy while the calculation is efficient.The processing energy efficiency of the relation network is improved by using the high-level synthesis(HLS)optimized floating point convolutional neural network.The heterogeneous multi-core processing method of multiple computing units is used to satisfy the convergence of FPGA timing sequence and improve the throughput capacity of FPGA chip.In this paper,a relation network inference operation unit is implemented on FPGA platform.The power consumption of the accelerator built on Omniglot dataset is 15.867W,and the acceleration ratio relative to GPU is 1.4~17.2.The power consumption of the accelerator built on the miniImagenet dataset is 12.359W,and the acceleration ratio relative to the GPU is 1.5~3.4.Compared with similar FPGA accelerated floating-point convolutional neural networks,the proposed method achieves the optimal computational performance.The experimental data show that this method effectively utilizes the advantages of software and hardware collaborative computing and FPGA reconfigurable computation,reduces the coupling degree of software and hardware collaborative development,and improves the computational efficiency of relation network inference while maintaining the accuracy of relation network inference calculation.

作者张志超王剑章隆兵肖俊华 ZHANG Zhichao;WANG Jian;ZHANG Longbing;XIAO Junhua(State Key Laboratory of Computer Architecture,Institute of Computer Technology,Chinese Academy of Sciences,Beijing 100190;Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190;University of Chinese Academy of Sciences,Beijing 100049;The 15th Research Institute of China Electronics Technology Group Corporation,Beijing 100083)

机构地区计算机体系结构国家重点实验室(中国科学院计算技术研究所) 中国科学院计算技术研究所中国科学院大学中国电子科技集团公司第十五研究所

出处《高技术通讯》 CAS 2022年第4期327-336,共10页 Chinese High Technology Letters

基金国家自然科学基金(61432016) 国家重点研发计划(2018YFC0832306,2018YFC0831203,2018YFC0831206)资助项目。

关键词关系网络软硬件协同加速卷积神经网络异构多核 relation network software and hardware co-acceleration convolutional neural network heterogeneous multi-core

分类号 TP391.41 [自动化与计算机技术—计算机应用技术] TP183 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献2

1蹇强,张培勇,王雪洁.一种可配置的CNN协加速器的FPGA实现方法[J].电子学报,2019,47(7):1525-1531. 被引量：20
2赵烁,范军,何虎.基于FPGA的CNN加速SoC系统设计[J].计算机工程与设计,2020,41(4):939-944. 被引量：9

共引文献25

1张舰.父亲(外一首)[J].岁月,2000(7):60-60.
2杨博文,杨海涛,高浩浩.CNN加速器中卷积计算单元的硬件设计[J].数字技术与应用,2019,37(10):136-137. 被引量：2
3左国渭,应三丛.FPGA的可配置卷积运算单元的设计与实现[J].单片机与嵌入式系统应用,2020,20(11):54-58. 被引量：2
4YUAN Yong,CHEN Chen,HU Xiyuan,PENG Silong.CNQ:Compressor-Based Non-uniform Quantization of Deep Neural Networks[J].Chinese Journal of Electronics,2020,29(6):1126-1133.
5张帆.图像卷积实时计算的FPGA实现[J].电子设计工程,2021,29(1):132-137. 被引量：4
6康磊,李慧,郑豪威,李鑫.卷积神经网络RLeNet加速器设计[J].电脑知识与技术,2021,17(6):16-19. 被引量：2
7齐延荣,周夏冰,李斌,周清雷.基于FPGA的CNN图像识别加速与优化[J].计算机科学,2021,48(4):205-212. 被引量：9
8刘杰,葛一凡,田明,马力强.基于ZYNQ的可重构卷积神经网络加速器[J].电子学报,2021,49(4):729-735. 被引量：10
9李磊,徐国伟,李文婧,宋庆增.基于深度学习的舰船目标检测算法与硬件加速[J].计算机应用,2021,41(S01):162-166. 被引量：7
10安国臣,袁宏拓,韩秀璐,王晓君,侯雨佳.基于FPGA的通用卷积层IP核设计[J].河北科技大学学报,2021,42(3):241-247. 被引量：3

1郭明坤,杨峰,刘凯,夏广庆,杨竞楠,无.高超声速飞行器协同制导技术研究进展[J].空天技术,2022(2):75-84. 被引量：9
2袁三男,王孟彬,陶倩昀,张艳秋.异构多处理平台并行实时编码算法研究[J].上海电力大学学报,2022,38(2):158-162. 被引量：1
3Codasip大学项目激发创新并推动课程发展[J].单片机与嵌入式系统应用,2022,22(5):24-24.
4王文武.客座主编寄语[J].微纳电子与智能制造,2021,3(1).
5张晶.高校软硬件协同开发实验教学模式探索[J].电脑与电信,2021(8):67-69.
6无.2022年可能照进现实的科技趋势[J].软件和集成电路,2022(2):40-41.
7潘嘉,翟江涛.一种基于自蒸馏的自适应恶意流量分类算法[J].软件导刊,2022,21(5):61-66.
8张莹,杜春玲.重视产业引领性,加速发展集成电路技术——访中国工程院院士吴汉明[J].微纳电子与智能制造,2021,3(1):1-3. 被引量：2
9李小波,唐志敏,李文.面向异构多核处理器的FPGA验证[J].计算机研究与发展,2021,58(12):2684-2695. 被引量：3
10王静莲,龚斌,刘弘,李少辉.软硬件节能原理深度融合之绿色异构调度算法[J].软件学报,2021,32(12):3768-3781. 被引量：3

高技术通讯

2022年第4期

浏览历史

内容加载中请稍等...

基于软硬件协同加速的关系网络推理优化方法

参考文献2

共引文献25

相关作者

相关机构

相关主题

浏览历史