OpenMDSP:Extending OpenMP to Program Multi-Core DSPs 被引量：1

OpenMDSP:Extending OpenMP to Program Multi-Core DSPs

导出

摘要 Multi-core digital signal processors （DSPs） are widely used in wireless telecommunication, core network transcoding, industrial control, and audio/video processing technologies, among others. In comparison with general-purpose multi-processors, multi-core DSPs normally have a more complex memory hierarchy, such as on-chip core-local memory and non-cache-coherent shared memory. As a result, efficient multi-core DSP applications are very difficult to write. The current approach used to program multi-core DSPs is based on proprietary vendor software development kits （SDKs）, which only provide low-level, non-portable primitives. While it is acceptable to write coarse-grained task-level parallel code with these SDKs, writing fine-grained data parallel code with SDKs is a very tedious and error-prone approach. We believe that it is desirable to possess a high-level and portable parallel programming model for multi-core DSPs. In this paper, we propose OpenMDSP, an extension of OpenMP designed for multi-core DSPs. The goal of OpenMDSP is to fill the gap between the OpenMP memory model and the memory hierarchy of multi-core DSPs. We propose three classes of directives in OpenMDSP, including 1） data placement directives that allow programmers to control the placement of global variables conveniently, 2） distributed array directives that divide a whole array into sections and promote the sections into core-local memory to improve performance, and 3） stream access directives that promote big arrays into core-local memory section by section during parallel loop processing while hiding the latency of data movement by the direct memory access （DMA） of a DSP. We implement the compiler and runtime system for OpenMDSP on PreeScale MSC8156. The benchmarking results show that seven of nine benchmarks achieve a speedup of more than a factor of 5 when using six threads. Abstract Multi-core digital signal processors （DSPs） are widely used in wireless telecommunication, core network transcoding, industrial control, and audio/video processing technologies, among others. In comparison with general-purpose multi-processors, multi-core DSPs normally have a more complex memory hierarchy, such as on-chip core-local memory and non-cache-coherent shared memory. As a result, efficient multi-core DSP applications are very difficult to write. The current approach used to program multi-core DSPs is based on proprietary vendor software development kits （SDKs）, which only provide low-level, non-portable primitives. While it is acceptable to write coarse-grained task-level parallel code with these SDKs, writing fine-grained data parallel code with SDKs is a very tedious and error-prone approach. We believe that it is desirable to possess a high-level and portable parallel programming model for multi-core DSPs. In this paper, we propose OpenMDSP, an extension of OpenMP designed for multi-core DSPs. The goal of OpenMDSP is to fill the gap between the OpenMP memory model and the memory hierarchy of multi-core DSPs. We propose three classes of directives in OpenMDSP, including 1） data placement directives that allow programmers to control the placement of global variables conveniently, 2） distributed array directives that divide a whole array into sections and promote the sections into core-local memory to improve performance, and 3） stream access directives that promote big arrays into core-local memory section by section during parallel loop processing while hiding the latency of data movement by the direct memory access （DMA） of a DSP. We implement the compiler and runtime system for OpenMDSP on PreeScale MSC8156. The benchmarking results show that seven of nine benchmarks achieve a speedup of more than a factor of 5 when using six threads.

作者何江舟陈文光陈光日郑纬民汤志忠叶寒栋

机构地区 Department of Computer Science and Technology Huawei Technologies Co. Ltd.

出处《Journal of Computer Science & Technology》 SCIE EI CSCD 2014年第2期316-331,共16页 计算机科学技术学报（英文版）

基金 supported by the National High Technology Research and Development 863 Program of China under Grant No.2012AA010901 the National Natural Science Foundation of China under Grant No.61103021

关键词 OPENMP multi-core digital signal processor data parallelism Long Term Evolution OpenMP, multi-core digital signal processor, data parallelism, Long Term Evolution

分类号 TP332 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献24

1Karam L, A1Kamal I, Gatherer A, Frantz G, Anderson D, Evans B. Trends in multicore DSP platforms. Signal Process- ing Magazine, IEEE, 2009, 26(6): 38-49.
2Zyren J. Overview of the 3GPP long term evolution physical layer, 2007. http://www.freescale.com/files/wireless_comm/ doc/white_paper/3GFPEVOLUTIONWP.pdf, Nov. 2013.
3Reid A D, Flautner K, Grimley-Evans E, Lin Y. SoC-C: Ef- ficient programming abstractions for heterogeneous multicore systems on chip. In Proc. the 2008 CASES, October 2008, pp.95-104.
4Thies W, Karczmarek M, Amarasinghe S. StreamIt: A lan- guage for streaming applications. In Proe. Int. Conf. Com- piler Construction, April 2002, pp.179-196.
5Liao C, Hernandez O, Chapman B, Chen W, Zheng W. OpenUH: An optimizing, portable OpenMP compiler: Re- search Articles. Concurrency and Computation: Practice 8z Experience, 2007, 19(18): 2317-2332.
6Dave C, Bae H, Min S, Lee S, Eigenmann R, Midkiff S. Ce- tus: A source-to-source compiler infrastructure for multicores. Computer, 2009, 42(11): 36-42.
7Parr T, Quong R. ANTLR: A predicated-LL(k) parser gene- rator. Software Practice 8z Experience, 1995, 25(7): 789- 810.
8Tian X, Girkar M, Shah Set al. Compiler and runtime support for running OpenMP programs on Pentium- and Itanium-architectures. In Proc. the 17th Parallel and Dis- tributed Processing Symposium, April 2003, pp.9-18.
9Mfiller M S. Some simple OpenMP optimization techniques. In Lecture Notes in Computer Science 2104, Eigenmann R, Voss M, (eds.), Springer, 2001, pp.31-39.
10Tian X, Girkar M, Bik A, Saito H. Practical compiler tech- niques on efficient multithreaded code generation for OpenMP programs. Computer Journal, 2005, 48(5): 588-601.

同被引文献17

1LOWE D G. Distinctive image features from scakin- variant key-points [J]. International Journal of Com- puter Vision, 2004, 60(2): 91-110.
2KE Yan, SUKTHANKAR R. PCA-SIFT: a more dis- tinctive representation for local image descriptors [C] //Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington DG, USA, 2004, 2: 506-513.
3MIKOI.AJCZYK K, SCHMID C. A performance eval- uation of local descriptors [J]. 1EEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27 (10): 1615- 1630.
4BAY H, ESS A, Tuytelaars T. SURF: speeded-up ro- bust features [J]. Computer Vision and Image Under standing, 2008, 3(110): 346-359.
5HERKKILA M, PIETIKAINEN M, SCHMID C. De- scription of interest regions with local binary patterns [J]. Pattern Recognition, 2009, 42(3): 425-436.
6KIM J S, HWANGBO M, KANADE T. Parallel algo- rithms to a parallel hardware: designing vision algo- rithms for a GPU [C] //IEEE 12th International Con- ference on Computer Vision Workshops. Kyoto, IEEE, 2009 : 862-869.
7APON A, WARN S, EMENEKER W, et al. Aecelera ting SIFT on parallel architectures[J]. Cluster, 2009: 577-580.
8QIU Jingbang, HUANG Tianci, IKENAGA T. A 7- Round Parallel Hardware-Saving Accelerator for Gauss- ian and DoG Pyramid Construction Part of SIFT [M]. Computer Vision-ACCV 2009. Springer Berlin Heidel- berg, 2010: 75-84.
9HUANG F C, HUANG S Y, KER J W, et al. High- performance SIFT hardware accelerator for real-time image feature extraction[J]- IEEE Transactions on Circuits and Systems for Video Technology, 2012, 22 (3) : 340-351.
10LINDEBERG T. Scale-space for discrete signals[J]. IEEE Transactions on Pattern Analysis and Machine In- telligence, 1990, 12(3): 234-254.

引证文献1

1罗勇,叶正源,陈远知.高效并行递归高斯SIFT算法的实现[J].中国科技论文,2015,10(20):2382-2385. 被引量：1

二级引证文献1

1杨洋,兰佳梅,关硕森,姚磊,梁超.联合特定人物和场景的视频实例检索问题与方法[J].中国科技论文,2018,13(8):893-901.

1运维提升业务管理[J].网管员世界,2010(24):24-24.
2韩曙,刘明业.数组划分与快速地址映射算法在存储器高级综合中的应用[J].电子学报,2000,28(11):12-15. 被引量：1
3卢官明.采用DMA技术高速采集摄像CCD输出信号[J].南京邮电学院学报,1993,13(2):51-56.
4王科,胡永华,杨明.一种PCI数据采集卡中DMA模块的软硬件设计[J].微计算机信息,2006(07S):126-128. 被引量：1
5Chunjie LUO,Jianfeng ZHAN,Zhen JIA,Lei WANG,Gang LU,Lixin ZHANG,Cheng-Zhong XU,Ninghui SUN.CloudRank-D： benchmarking and ranking cloud computing systems for data processing applications[J].Frontiers of Computer Science,2012,6(4):347-362. 被引量：5
6唐北平,肖建华.快速模板匹配算法在图像循环处理中的实现[J].信息技术,2006,30(8):34-36.
7黄艳,肖铁军.直接存储器存取提高VXI测试系统性能[J].国外电子测量技术,1999,18(6):11-11.
8田伟,刘治.TMS320VC5402与MSP430的接口实现[J].信息技术与信息化,2004(5):38-39.
9手把手教你——解决操作系统中的“硬件冲突”[J].数码时代,2006(5):68-70.
10Mohamed S. M. Salem.An Overview of the Analytic Hierarchy Process （AHP） and Its Applications to Determine Benchmarking Criteria in Industrial Companies[J].Journal of Modern Accounting and Auditing,2012,8(11):1673-1690.

Journal of Computer Science & Technology

2014年第2期

浏览历史

内容加载中请稍等...

OpenMDSP:Extending OpenMP to Program Multi-Core DSPs 被引量：1

参考文献24

同被引文献17

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史