Training deep neural networks(DNNs)requires a significant amount of time and resources to obtain acceptable results,which severely limits its deployment in resource-limited platforms.This paper proposes DarkFPGA,a nov...Training deep neural networks(DNNs)requires a significant amount of time and resources to obtain acceptable results,which severely limits its deployment in resource-limited platforms.This paper proposes DarkFPGA,a novel customizable framework to efficiently accelerate the entire DNN training on a single FPGA platform.First,we explore batch-level parallelism to enable efficient FPGA-based DNN training.Second,we devise a novel hardware architecture optimised by a batch-oriented data pattern and tiling techniques to effectively exploit parallelism.Moreover,an analytical model is developed to determine the optimal design parameters for the DarkFPGA accelerator with respect to a specific network specification and FPGA resource constraints.Our results show that the accelerator is able to perform about 10 times faster than CPU training and about a third of the energy consumption than GPU training using 8-bit integers for training VGG-like networks on the CIFAR dataset for the Maxeler MAX5 platform.展开更多
文摘Training deep neural networks(DNNs)requires a significant amount of time and resources to obtain acceptable results,which severely limits its deployment in resource-limited platforms.This paper proposes DarkFPGA,a novel customizable framework to efficiently accelerate the entire DNN training on a single FPGA platform.First,we explore batch-level parallelism to enable efficient FPGA-based DNN training.Second,we devise a novel hardware architecture optimised by a batch-oriented data pattern and tiling techniques to effectively exploit parallelism.Moreover,an analytical model is developed to determine the optimal design parameters for the DarkFPGA accelerator with respect to a specific network specification and FPGA resource constraints.Our results show that the accelerator is able to perform about 10 times faster than CPU training and about a third of the energy consumption than GPU training using 8-bit integers for training VGG-like networks on the CIFAR dataset for the Maxeler MAX5 platform.