期刊文献+

基于GPU的卷积检测模型加速 被引量:4

CONVOLUTION-BASED DETECTION MODELS ACCELERATION BASED ON GPU
下载PDF
导出
摘要 近年来,形变部件模型和卷积神经网络等卷积检测模型在计算机视觉领域取得了极大的成功。这类模型能够进行大规模的机器学习训练,实现较高的鲁棒性和识别性能。然而训练和评估过程中卷积运算巨大的计算开销,也限制了其在诸多实际场景中进一步的应用。利用数学理论和并行技术对卷积检测模型进行算法和硬件的双重加速。在算法层面,通过将空间域中的卷积运算转换为频率域中的点乘运算来降低计算复杂度;而在硬件层面,利用GPU并行技术可以进一步减少计算时间。在PASCAL VOC数据集上的实验结果表明,相对于多核CPU,该算法能够实现在单个商用GPU上加速卷积过程2.13~4.31倍。 In recent years,convolution-based detection models( CDM),such as the deformable part-based models( DPM) and the convolutional neural networks( CNN),have achieved tremendous success in computer vision field. These models allow for large-scale machine learning training to achieve higher robustness and recognition performance. However,the huge computational cost of convolution operation in training and evaluation processes also restricts their further application in many practical scenes. In this paper,we accelerate both the algorithm and hardware of convolution-based detection models with mathematical theory and parallelisation technique. In the aspect of algorithm,we reduce the computation complexity by converting the convolution operation in space domain to the point multiplication operation in frequency domain. While in the aspect of hardware,the use of graphical process unit( GPU) parallelisation technique can reduce the computational time further. Results of experiment on public dataset Pascal VOC demonstrate that compared with multi-core CPU,the proposed algorithm can realise speeding up the convolution process by 2. 13 to 4. 31 times on single commodity GPU.
出处 《计算机应用与软件》 CSCD 2016年第5期226-230,共5页 Computer Applications and Software
基金 国家自然科学基金项目(61175009) 上海市产学研合作项目(沪CXY-2013-82)
关键词 卷积检测模型 计算机视觉 GPU Convolution-based detection model Computer vision GPU
  • 相关文献

参考文献23

  • 1Everingham M,Van Gool L,Williams C K I,et al.The pascal visual object classes(voc)challenge[J].International journal of computer vision,2010,88(2):303-338.
  • 2Deng J,Dong W,Socher R,et al.Imagenet:A large-scale hierarchical image database[C]//Computer Vision and Pattern Recognition,2009.CVPR 2009.IEEE Conference on.IEEE,2009:248-255.
  • 3Dubout C,Fleuret F.Exact acceleration of linear object detectors[C]//Computer Vision–ECCV 2012.Springer Berlin Heidelberg,2012:301-311.
  • 4Felzenszwalb P F,Girshick R B,Mc Allester D,et al.Object detection with discriminatively trained part-based models[J].Pattern Analysis and Machine Intelligence,IEEE Transactions on,2010,32(9):1627-1645.
  • 5Dalal N,Triggs B.Histograms of oriented gradients for human detection[C]//Computer Vision and Pattern Recognition,2005.CVPR 2005.IEEE Computer Society Conference on.IEEE,2005,1:886-893.
  • 6Felzenszwalb P F,Huttenlocher D P.Pictorial structures for object recognition[J].International Journal of Computer Vision,2005,61(1):55-79.
  • 7Felzenszwalb P F,Girshick R B,Mc Allester D.Cascade object detection with deformable part models[C]//Computer vision and pattern recognition(CVPR),2010 IEEE conference on.IEEE,2010:2241-2248.
  • 8Song H O,Zickler S,Althoff T,et al.Sparselet models for efficient multiclass object detection[C]//Computer Vision–ECCV 2012.Springer Berlin Heidelberg,2012:802-815.
  • 9Hirabayashi M,Kato S,Edahiro M,et al.GPU implementations of object detection using HOG features and deformable models[C]//CyberPhysical Systems,Networks,and Applications(CPSNA),2013 IEEE1st International Conference on.IEEE,2013:106-111.
  • 10De Smedt F,Struyf L,Beckers S,et al.Is the game worth the candle?Evaluation of Open CL for object detection algorithm optimization[C]//International Conference on PECCS,2012:284-291.

二级参考文献28

  • 1Duhamel P, Vetterli M. Fast fourier transforms: A tutorial review and a state of the art. Signal Processing, 1990, 9(14): 259-299.
  • 2Govindaraju N K, Lloyd B, Dotsenko Y, Smith B, Manferdelli J. High performance discrete Fourier transforms on graphics processors. In Proc. SC, Nov. 2008, Article No.2.
  • 3Nukada A, Matsuoka S. Auto-tuning 3-D FFT library for CUDA GPUs. In Proc. SC, Nov. 2009, Article No.30. Dotsenko Y, Baghsorkhi S S, Lloyd B, Govindaraju N K. Auto-tuning of fast Fourier transform on graphics processors. In Proc PPoPP, Feb. 2011, pp.257-266.
  • 4Gu L, Li X M, Siegel J. An empirically tu:ed 2D and 3D FFT library on CUDA GPU. In Proc. the 2:th ICS, June 2010, pp.305:314.
  • 5Gaster B, Howes L, Kaeli D R, Mistry P, $chaa D. Heteroge- neous Computing with OpenCL. San Fransisco, USA: Morgan Kaufmann: 2011.
  • 6Munshi A, Gaster B, Mattson T G, Fung J, Ginsburg D. OpenCL Programming Guide. Boston, USA: Addison-Wesley Professional. 2011.
  • 7Zhang E Z, Jiang Y L, Guo GPU applications on the fly: Z Y, Shen X P. Streamlining Thread divergence elimination through runtime thread-data remapping. In Proc. the 2.:th ICS, June 2010: pp.115-126.
  • 8Zhang E Z, Jiang Y L, Guo Z Y, Shen X P. Streamlining GPU applications on the fly: Thread divergence elimination through runtime thread-data remapping. In Proc. the 24th ICS, June 2010, pp.115-126.
  • 9Yang Y, Xiang P, Kong J F, Zhou H Y. A GPGPU com- piler for memory optimization and parallelism management. In Proc. PLDI, June 2010, pp.86-97.
  • 10Cooley J W, Tukey J W. An algorithm for the machine cal- culation of complex Fourier series. Mathematics of Compu- tation, 1965, 19: 297-301.

共引文献9

同被引文献16

引证文献4

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部