CNN Accelerator Using Proposed Diagonal Cyclic Array for Minimizing Memory Accesses

下载PDF

导出

摘要 This paper presents the architecture of a Convolution Neural Network(CNN)accelerator based on a newprocessing element(PE)array called a diagonal cyclic array(DCA).As demonstrated,it can significantly reduce the burden of repeated memory accesses for feature data and weight parameters of the CNN models,which maximizes the data reuse rate and improve the computation speed.Furthermore,an integrated computation architecture has been implemented for the activation function,max-pooling,and activation function after convolution calculation,reducing the hardware resource.To evaluate the effectiveness of the proposed architecture,a CNN accelerator has been implemented for You Only Look Once version 2(YOLOv2)-Tiny consisting of 9 layers.Furthermore,the methodology to optimize the local buffer size with little sacrifice of inference speed is presented in this work.We implemented the proposed CNN accelerator using a Xilinx Zynq ZCU102 Ultrascale+Field Programmable Gate Array(FPGA)and ISE Design Suite.The FPGA implementation uses 34,336 Look Up Tables(LUTs),576 Digital Signal Processing(DSP)blocks,and an on-chip memory of only 58 KB,and it could achieve accuracies of 57.92% and 56.42% mean Average Precession@0.5 thresholds for intersection over union(mAP@0.5)using quantized 16-bit and 8-bit full integer data manipulation with only 0.68% as a loss for 8-bit version and computation time of 137.9 and 69 ms for each input image respectively using a clock speed of 200 MHz.These speeds are expected to be doubled five times using a clock speed of 1GHz if implemented in a silicon System on Chip(SoC)using a sub-micron process.

作者 Hyun-Wook Son Ali AAl-Hamid Yong-Seok Na Dong-Yeong Lee Hyung-Won Kim

机构地区 Department of Electronics Department of Electrical Engineering

出处《Computers, Materials & Continua》 SCIE EI 2023年第8期1665-1687,共23页 计算机、材料和连续体（英文）

基金 supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.2022R1A5A8026986) supported by the Institute of Information&communications Technology Planning&Evaluation(IITP)grant funded by the Korean government(MSIT)(No.2020-0-01304,Development of Self-learnable Mobile Recursive Neural Network Processor Technology) supported by the MSIT(Ministry of Science and ICT),Korea,under the Grand Information Technology Research Center support program(IITP-2023-2020-0-01462)'supervised by the IITP(Institute for Information&communications Technology Planning&Evaluation)and supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.2021R1F1A1061314).

关键词 CNN ACCELERATOR systolic array memory optimization YOLOv2-tiny mAP@0.5

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

1Peng XU,Jianxin ZHAO,Chi Harold LIU.Optimization and Deployment of Memory-Intensive Operations in Deep Learning Model on Edge[J].计算机科学,2023,50(2):3-12.
2杨校李,高林,赵晓雨,彭运猛,廖明艳.基于改进YOLOv7-tiny算法的输电线路螺栓缺销检测[J].湖北民族大学学报（自然科学版）,2023,41(3):314-321. 被引量：2
3焦晓东,袁明峰,陶金,孙昊,孙青林,陈增强.Memristor hyperchaos in a generalized Kolmogorov-type system with extreme multistability[J].Chinese Physics B,2023,32(1):264-271.
4Kh Shahriya Zaman,Md Mamun Bin Ibne Reaz.Secure and efficient implementation of facial emotion detection for smart patient monitoring system[J].Quantitative Biology,2023,11(2):175-182.
5Jingzhe Tang,Yanfeng Zheng,Chao Yang,Wei Wang,Yaozhi Luo.Parallelized Implementation of the Finite Particle Method for Explicit Dynamics in GPU[J].Computer Modeling in Engineering & Sciences,2020(1):5-31. 被引量：6
6郭家栋,彭靖姝,苑航,倪明选.HXPY: A High-Performance Data Processing Package for Financial Time-Series Data[J].Journal of Computer Science & Technology,2023,38(1):3-24.
7Khakoo Mal,Tayab Din Memon,Imtiaz Hussain Kalwar,Bhawani Shankar Chowdhry.FPGA Implementation of Extended Kalman Filter for Parameters Estimation of Railway Wheelset[J].Computers, Materials & Continua,2023(2):3351-3370. 被引量：1
8PENG Peng,PAN Yu-yan,WANG Jun-feng,LIN Jin-tong.Study of hybrid optical burst networks[J].Frontiers of Electrical and Electronic Engineering in China,2006,1(4):416-420.
9Xiaolin Xing,Shujie Yang,Bohan Li.Accelerate Single Image Super-Resolution Using Object Detection Process[J].Computers, Materials & Continua,2023,76(8):1585-1597.
10Ily s Abdullaev,Natalia Prodanova,KAruna Bhaskar,ELaxmi Lydia,Seifedine Kadry,Jungeun Kim.Task Offloading and Resource Allocation in IoT Based Mobile Edge Computing Using Deep Learning[J].Computers, Materials & Continua,2023,76(8):1463-1477. 被引量：1

Computers, Materials & Continua

2023年第8期

浏览历史

内容加载中请稍等...

CNN Accelerator Using Proposed Diagonal Cyclic Array for Minimizing Memory Accesses

相关作者

相关机构

相关主题

浏览历史