期刊文献+

共享内存结构并行程序的编译器优化 被引量:1

Compiler Optimization Algorithm for Parallel Program on Shared Memory Machine
下载PDF
导出
摘要 共享内存结构上的程序自动并行化通常实现循环级并行,采用fork-join执行模式,并行性能有待提高。论文结合fork-join和SPMD两种执行模式的优势,在并行化编译过程中通过并行区合并和扩展,实现fork-join和SPMD混合执行模式,并在SPMD并行区中实现了基于跨处理器相关图的barrier同步优化。分析验证表明,这些优化策略减少了并行区和barrier同步的数目,有效地提高了生成并行程序的性能。 Automatic parallelization on shared-memory machine is mostly loop-level and uses fork-join model.The performance of these program isn't satisfying.In this paper,a hybrid programming model is employed to combine the flexibility of the fork-join model with the power of the single program multiple data(SPMD)model by parallel region extension and combination.And a barrier synchronization optimization algorithm based on cross-processor dependence graph is developed to eliminate redundant barriers in each SPMD parallel region.Analysis show that these strategies reduce the number of parallel region and barrier synchronization,and can improve the performance of generated parallel codes.
出处 《计算机工程与应用》 CSCD 北大核心 2006年第1期13-16,共4页 Computer Engineering and Applications
基金 国家部委重点科研资助项目
关键词 跨处理器相关 barrier同步 SPMD并行区 数据相关图 cross-processor dependence,barrier synchronization,SPMD parallel region,data dependence graph
  • 相关文献

参考文献6

  • 1Chau-Wen Tseng.Compiler Optimizations for Eliminating Barrier Synchronization[C].In:the 5th ACM Symposium on Principles and Practice of Parallel Programming(PPOPP'95) ,Santa Barbara,CA, 1995.
  • 2G'eraud Krawezik ,Franck Cappello.Performance Comparison of MPI and three OpenMP Programming Styles on Shared Memory Multiprocessors[C].In : SPAA'03, San Diego, California, USA, 2003.
  • 3张平,赵荣彩,李清宝.基于相关性的同步优化算法[J].计算机工程,2005,31(17):68-70. 被引量:5
  • 4Robert P Wilson,Robert S French,Christopher S Wilson.SUIF:An infrastructure for Research on Parallelizing and Optimizing Compilers. US : Computer Systems Laboratory Stanford University, 1994.
  • 5Hwansoo Hart et al,Compiler-parallelized Codes of Software DSMs, US : University of Maryland.
  • 6Amy W Lim,Monica Lain.Maximizing Parallelism and Minimizing Synchronization with Affine Transforms[C].In:24th Annual ACM SIG-PLAN-SIGACT Symposium, 1997.

二级参考文献5

  • 1OpenMP C Application Program Interface (Version 2.0). htto://www. Openmp.org, 2000-11.
  • 2Appelbe B, Doddapaneni S, Hardnett C. A New Algorithm for Global Optimization for Parallelism and Locality. 7th International Workshop on Languages and Compilers for Parallel Computing, 1994-08.
  • 3Chen D K. Compiler Optimizations for Parallel Loops with Fine-Grained Synchronization(Ph. D. Dissertation). University of Illinois at Urbana-Champaign, 1994.
  • 4Gupta M, Schonberg E. Static Analysis to Reduce Synchronization Costs in Data-parallel Programs. In: Proc. of Principles of Programming Languages, 1996-01.
  • 5Tseng C W. Compiler Optimizations for Eliminating Barrier Synchronization. In: Proc. of Principles and Practice of Parallel Programming , 1995-08.

共引文献4

同被引文献12

  • 1孙玉强,刘三阳,张英丽,马正华.FIRST和FOLLOW集合的并行算法设计[J].计算机工程,2004,30(21):71-73. 被引量:1
  • 2尉红梅,姚建华.并行语言及编译技术现状和发展趋势[J].计算机工程,2004,30(B12):97-98. 被引量:5
  • 3曾绍华,魏延.共享存储器多处理机并行计算编译及调度机制[J].重庆师范大学学报(自然科学版),2006,23(1):27-30. 被引量:5
  • 4董春丽,韩林,赵荣彩.并行编译中一种线性数据和计算划分算法[J].计算机工程,2006,32(24):26-28. 被引量:5
  • 5沈志宇,胡子昂.并行编译方法[M].北京:国防工业出版社,2001.
  • 6HUANG JIN-WOEI, CHU PCHIH-PING. An Efficient Communicatin Scheduling Method for the Processor Mapping Technique Applied Data Redistribution [J]. Journal of Supercomputing, 2006, 37 (3):297-318.
  • 7STOJMENOVCI I, SEDDIGH M, ZUNIC J. Dominating Sets and Neighbor Elimination-Based Broadcasting Algorithms in Wireless Networks [ J]. IEEE Trans on Parallel and Distributed Systems, 2002, 13 (1) : 14-25.
  • 8G·ERAUD KRAWEZIK, FRANCH C. Performance Comparison of MPI and Three OpenMP Programming Styles on Shared Memory Multiprocessors [ C ] //SPAA'03. SanDiego, California, USA : ACM, 2003 : 118-127.
  • 9LI Jian-hui, ZANG Bin-yu, WU Rong, et al. Run-Time Data-Flow Analysis [ J]. Journal of Computer Science.and Technol- ogy, 2002, 17 (4): 442-449.
  • 10MATTHEW C CHILDESTER, ALAN D GEORGE, MATTHEW A, et al. Multiple-Path Execution for Chip Multiprocessers [ J]. Journal of Systems Architecture: the EUROMICRO Journal, 2003, 49 (1/2) : 33-52.

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部