期刊文献+

OpenMDSP:Extending OpenMP to Program Multi-Core DSPs 被引量:1

OpenMDSP:Extending OpenMP to Program Multi-Core DSPs
原文传递
导出
摘要 Multi-core digital signal processors (DSPs) are widely used in wireless telecommunication, core network transcoding, industrial control, and audio/video processing technologies, among others. In comparison with general-purpose multi-processors, multi-core DSPs normally have a more complex memory hierarchy, such as on-chip core-local memory and non-cache-coherent shared memory. As a result, efficient multi-core DSP applications are very difficult to write. The current approach used to program multi-core DSPs is based on proprietary vendor software development kits (SDKs), which only provide low-level, non-portable primitives. While it is acceptable to write coarse-grained task-level parallel code with these SDKs, writing fine-grained data parallel code with SDKs is a very tedious and error-prone approach. We believe that it is desirable to possess a high-level and portable parallel programming model for multi-core DSPs. In this paper, we propose OpenMDSP, an extension of OpenMP designed for multi-core DSPs. The goal of OpenMDSP is to fill the gap between the OpenMP memory model and the memory hierarchy of multi-core DSPs. We propose three classes of directives in OpenMDSP, including 1) data placement directives that allow programmers to control the placement of global variables conveniently, 2) distributed array directives that divide a whole array into sections and promote the sections into core-local memory to improve performance, and 3) stream access directives that promote big arrays into core-local memory section by section during parallel loop processing while hiding the latency of data movement by the direct memory access (DMA) of a DSP. We implement the compiler and runtime system for OpenMDSP on PreeScale MSC8156. The benchmarking results show that seven of nine benchmarks achieve a speedup of more than a factor of 5 when using six threads. Abstract Multi-core digital signal processors (DSPs) are widely used in wireless telecommunication, core network transcoding, industrial control, and audio/video processing technologies, among others. In comparison with general-purpose multi-processors, multi-core DSPs normally have a more complex memory hierarchy, such as on-chip core-local memory and non-cache-coherent shared memory. As a result, efficient multi-core DSP applications are very difficult to write. The current approach used to program multi-core DSPs is based on proprietary vendor software development kits (SDKs), which only provide low-level, non-portable primitives. While it is acceptable to write coarse-grained task-level parallel code with these SDKs, writing fine-grained data parallel code with SDKs is a very tedious and error-prone approach. We believe that it is desirable to possess a high-level and portable parallel programming model for multi-core DSPs. In this paper, we propose OpenMDSP, an extension of OpenMP designed for multi-core DSPs. The goal of OpenMDSP is to fill the gap between the OpenMP memory model and the memory hierarchy of multi-core DSPs. We propose three classes of directives in OpenMDSP, including 1) data placement directives that allow programmers to control the placement of global variables conveniently, 2) distributed array directives that divide a whole array into sections and promote the sections into core-local memory to improve performance, and 3) stream access directives that promote big arrays into core-local memory section by section during parallel loop processing while hiding the latency of data movement by the direct memory access (DMA) of a DSP. We implement the compiler and runtime system for OpenMDSP on PreeScale MSC8156. The benchmarking results show that seven of nine benchmarks achieve a speedup of more than a factor of 5 when using six threads.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2014年第2期316-331,共16页 计算机科学技术学报(英文版)
基金 supported by the National High Technology Research and Development 863 Program of China under Grant No.2012AA010901 the National Natural Science Foundation of China under Grant No.61103021
关键词 OPENMP multi-core digital signal processor data parallelism Long Term Evolution OpenMP, multi-core digital signal processor, data parallelism, Long Term Evolution
  • 相关文献

参考文献24

  • 1Karam L, A1Kamal I, Gatherer A, Frantz G, Anderson D, Evans B. Trends in multicore DSP platforms. Signal Process- ing Magazine, IEEE, 2009, 26(6): 38-49.
  • 2Zyren J. Overview of the 3GPP long term evolution physical layer, 2007. http://www.freescale.com/files/wireless_comm/ doc/white_paper/3GFPEVOLUTIONWP.pdf, Nov. 2013.
  • 3Reid A D, Flautner K, Grimley-Evans E, Lin Y. SoC-C: Ef- ficient programming abstractions for heterogeneous multicore systems on chip. In Proc. the 2008 CASES, October 2008, pp.95-104.
  • 4Thies W, Karczmarek M, Amarasinghe S. StreamIt: A lan- guage for streaming applications. In Proe. Int. Conf. Com- piler Construction, April 2002, pp.179-196.
  • 5Liao C, Hernandez O, Chapman B, Chen W, Zheng W. OpenUH: An optimizing, portable OpenMP compiler: Re- search Articles. Concurrency and Computation: Practice 8z Experience, 2007, 19(18): 2317-2332.
  • 6Dave C, Bae H, Min S, Lee S, Eigenmann R, Midkiff S. Ce- tus: A source-to-source compiler infrastructure for multicores. Computer, 2009, 42(11): 36-42.
  • 7Parr T, Quong R. ANTLR: A predicated-LL(k) parser gene- rator. Software Practice 8z Experience, 1995, 25(7): 789- 810.
  • 8Tian X, Girkar M, Shah Set al. Compiler and runtime support for running OpenMP programs on Pentium- and Itanium-architectures. In Proc. the 17th Parallel and Dis- tributed Processing Symposium, April 2003, pp.9-18.
  • 9Mfiller M S. Some simple OpenMP optimization techniques. In Lecture Notes in Computer Science 2104, Eigenmann R, Voss M, (eds.), Springer, 2001, pp.31-39.
  • 10Tian X, Girkar M, Bik A, Saito H. Practical compiler tech- niques on efficient multithreaded code generation for OpenMP programs. Computer Journal, 2005, 48(5): 588-601.

同被引文献17

  • 1LOWE D G. Distinctive image features from scakin- variant key-points [J]. International Journal of Com- puter Vision, 2004, 60(2): 91-110.
  • 2KE Yan, SUKTHANKAR R. PCA-SIFT: a more dis- tinctive representation for local image descriptors [C] //Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington DG, USA, 2004, 2: 506-513.
  • 3MIKOI.AJCZYK K, SCHMID C. A performance eval- uation of local descriptors [J]. 1EEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27 (10): 1615- 1630.
  • 4BAY H, ESS A, Tuytelaars T. SURF: speeded-up ro- bust features [J]. Computer Vision and Image Under standing, 2008, 3(110): 346-359.
  • 5HERKKILA M, PIETIKAINEN M, SCHMID C. De- scription of interest regions with local binary patterns [J]. Pattern Recognition, 2009, 42(3): 425-436.
  • 6KIM J S, HWANGBO M, KANADE T. Parallel algo- rithms to a parallel hardware: designing vision algo- rithms for a GPU [C] //IEEE 12th International Con- ference on Computer Vision Workshops. Kyoto, IEEE, 2009 : 862-869.
  • 7APON A, WARN S, EMENEKER W, et al. Aecelera ting SIFT on parallel architectures[J]. Cluster, 2009: 577-580.
  • 8QIU Jingbang, HUANG Tianci, IKENAGA T. A 7- Round Parallel Hardware-Saving Accelerator for Gauss- ian and DoG Pyramid Construction Part of SIFT [M]. Computer Vision-ACCV 2009. Springer Berlin Heidel- berg, 2010: 75-84.
  • 9HUANG F C, HUANG S Y, KER J W, et al. High- performance SIFT hardware accelerator for real-time image feature extraction[J]- IEEE Transactions on Circuits and Systems for Video Technology, 2012, 22 (3) : 340-351.
  • 10LINDEBERG T. Scale-space for discrete signals[J]. IEEE Transactions on Pattern Analysis and Machine In- telligence, 1990, 12(3): 234-254.

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部