Landing Stencil Code on Godson-T 被引量：1

Landing Stencil Code on Godson-T

导出

摘要 The advent of multi-core/many-core chip technology offers both an extraordinary opportunity and a profound challenge. In particular, computer architects and system software designers are faced with a unique opportunity to introducing new architecture features as well as adequate compiler technology -- together they may have profound impact. This paper presents a case study （using the 1-D Jacobi computation） of compiler-amendable performance optimization techniques on a many-core architecture Godson-T. Godson-T architecture has several unique features that are chosen for this study： 1） chip-level global addressable memory in particular the scratchpad memories （SPM） local to the processing cores; 2） fine-grain memory based synchronization （e.g., full-empty bit for fine-grain synchronization）. Leveraging state-of-the-art performance optimization methods for 1-D stencil parallelization （e.g., timed tiling and variants）, we developed and implement a number of many-core-based optimization for Godson-T. Our experimental study shows good performance in both execution time speedup and scalability, validate the value of globally accessed SPM and fine-grain synchronization mechanism （full-empty bits） under the Godson-T, and provides some useful guidelines for future compiler technology of many-core chip architectures. The advent of multi-core/many-core chip technology offers both an extraordinary opportunity and a profound challenge. In particular, computer architects and system software designers are faced with a unique opportunity to introducing new architecture features as well as adequate compiler technology -- together they may have profound impact. This paper presents a case study （using the 1-D Jacobi computation） of compiler-amendable performance optimization techniques on a many-core architecture Godson-T. Godson-T architecture has several unique features that are chosen for this study： 1） chip-level global addressable memory in particular the scratchpad memories （SPM） local to the processing cores; 2） fine-grain memory based synchronization （e.g., full-empty bit for fine-grain synchronization）. Leveraging state-of-the-art performance optimization methods for 1-D stencil parallelization （e.g., timed tiling and variants）, we developed and implement a number of many-core-based optimization for Godson-T. Our experimental study shows good performance in both execution time speedup and scalability, validate the value of globally accessed SPM and fine-grain synchronization mechanism （full-empty bits） under the Godson-T, and provides some useful guidelines for future compiler technology of many-core chip architectures.

作者崔慧敏王蕾范东睿冯晓兵

机构地区 Key Laboratory of Computer System and Architecture Graduate University of Chinese Academy of Sciences

出处《Journal of Computer Science & Technology》 SCIE EI CSCD 2010年第4期886-894,共9页 计算机科学技术学报（英文版）

基金 Supported by the National Basic Research 973 Program of China under Grant No.2005CB321602 the National Natural Science Foundation of China under Grant No.60736012 the National High Technology Research and Development 863 Program of China under Grant Nos.2007AA01Z110 and 2009AA01Z103

关键词 many-core stencil Jacobi compiler SPM fine-grain synchronization many-core, stencil, Jacobi, compiler SPM, fine-grain synchronization

分类号 TP332 [自动化与计算机技术—计算机系统结构] TG76 [金属学及工艺—刀具与模具]

引文网络
相关文献

参考文献36

1Dally W J. Computer architecture in the many-core era. In Keynote at the 24th Int. Conf. Comput. Design, San Jose, CA, USA, Oct. 1, 2006.
2Borkar S Y, Mulder H, Dubey P, Pawlowski S S, Kahn K C, Rattner J R, Kuck D J. Platform 2015: Intel processor and platform evolution for the next decade. Technical Report, Intel White Paper, Mar. 2005.
3Seiler L, Carmean D, Sprangle E, Forsyth T, Abrash M, Dubey P, Junkins S, Lake A, Sugerman J, Cavin R, Espasa R, Grochowski E, Juan T, Hanrahan P. Larrabee: A many-core x86 architecture for visual computing. ACM Transactions on Graphics, 27(3): Article No. 18.
4Zhu W, Sreedhar V C, Hu Z, Gao G R. Synchronization state buffer: Supporting efficient fine-grain synchronization on many-core architectures. In Proc. ISCA 2007, San Diego, USA, June 9-13, 2007, pp.35-45.
5Hu Z, del Cuvillo J, Zhu W, Gao G R. Optimization of dense matrix multiplication on IBM Cyclops-64: Challenges and experiences. In Proc. Euro-Par 2006, Dresden, Germany, Aug. 29-Sept. 1, 2006, pp.134-144.
6Krishnamoorthy S, Baskaran M, Bondhugula U, Ramanujam J, Rountev A, Sadayappan P. Effective automatic parallelization of stencil computations. In Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, USA, June 10-13, 2007, pp.235-244.
7Frigo M, Strumpen V. The memory behavior of cache oblivious stencil computations. Journal of Supercomputing, 2006, 29(2): 93-112.
8Kamil s, Datta K, Williams S, Oliker L, Shall J, Yelick K. Implicit and explicit optimizations for stencil computations. In Proc. MSPC2006, San Jose, USA, Oct. 22, 2006, pp.51-60.
9Datta K, Murphy M, Volkov V, Williams S, Carter J, Oliker L, Patterson D, Shall J, Yelick K. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proc. SC2008, Austin, USA, Nov. 15-21, 2008, Article No. 1.
10Renganarayanan L, Harthikote-Matha M, Dewri R, Rajopadbye S V. Towards optimal multi-level tiling for stencil computations. In Proc. IPDPS, Long Beach, USA, Mar. 26-30, 2007, p.101.

引证文献1

1纪璎芮,袁良,张云泉.红黑Gauss-Seidel Stencil并行性和局部性优化[J].计算机科学,2022,49(5):363-370.

1春露.DreamMail战“备”守则[J].电脑迷,2010(1):73-73.
2＇Family Rules＇ for Wife during World Cup Popular on Internet[J].Women of China,2010(7):9-9.
3学习编程基本的24条守则[J].计算机与网络,2008,34(19):72-72.
4ttvv.OneNote共享守则[J].电脑高手,2004(12):60-60.
5毛毛虫.网游PK通用守则[J].电脑校园,2003(6):73-74.
6王新禧.网络青春守则[J].现代计算机（中旬刊）,2005(11):105-105.
7罗彬.菜鸟兵法八大守则[J].网络与信息,2003,17(10):73-73.
8范德生.提升效率即时通信“准”守则[J].电脑高手,2005(2):58-58.
9小学生安全上网30条守则[J].孩子（学生版）,2009(7):102-103.
10好男友守则[J].网友世界,2006(10):88-88.

Journal of Computer Science & Technology

2010年第4期

浏览历史

内容加载中请稍等...

Landing Stencil Code on Godson-T 被引量：1

参考文献36

引证文献1

相关作者

相关机构

相关主题

浏览历史