摘要
CUDA是应用较广的GPU通用计算模型,BP算法是目前应用最广泛的神经网络模型之一。提出了用CUDA模型并行化BP算法的方法。用该方法训练BP神经网络,训练开始前将数据传到GPU,训练开始后计算隐含层和输出层的输入输出和误差,更新权重和偏倚的过程都在GPU上实现。将该方法用于手写数字图片训练练实验,与在四核CPU上的训练相比,加速比为6.12~8.17。分别用在CPU和GPU上训练得到的结果识别相同的测试集图片,GPU上的训练结果对图片的识别率比CPU上的高0.05%~0.22%。
CUDA is a generally used GPGPU (General Purpose Computing on GPU) model. BP algorithm is one of the most widely used neural network model at present. A method of parallelizing BP algorithm using CUDA is proposed in this paper. When this method are used to train BP neural network, data are transferred to GPU before training. Process of computing inputs, outputs, errors of hidden layer and output layer and updating weights, biases are realized on GPU. Training handwritten digital images with this method has speed-up ratio between 6.12 and 8.17 compared to training on four cores CPU. When this two results are respectively used to recognize the same test set, the recognition rate based on training result on GPU increases 0.05% 0.22% compared to that of CPU.
出处
《计算机工程与应用》
CSCD
2013年第23期31-34,51,共5页
Computer Engineering and Applications
基金
计算机体系结构国家重点实验室开放课题资助(No.CARCH201105)
关键词
向后传播算法
并行化
计算统一设备架构
手写数字训练
Back-Propagation (BP) algorithm
parallelization
Compute United Device Architecture (CUDA)
handwrittendigits training