期刊文献+

Design and Implementation of an Extended Collectives Library for Unified Parallel C

Design and Implementation of an Extended Collectives Library for Unified Parallel C
原文传递
导出
摘要 Unified Parallel C (UPC) is a parallel extension of ANSI C based on the Partitioned Global Address Space (PGAS) programming model, which provides a shared memory view that simplifies code development while it can take advantage of the scalability of distributed memory architectures. Therefore, UPC allows programmers to write parallel applications on hybrid shared/distributed memory architectures, such as multi-core clusters, in a more productive way, accessing remote memory by means of different high-level language constructs, such as assignments to shared variables or collective primitives. However, the standard UPC collectives library includes a reduced set of eight basic primitives with quite limited functionality. This work presents the design and implementation of extended UPC collective functions that overcome the limitations of the standard collectives library, allowing, for example, the use of a specific source and destination thread or defining the amount of data transferred by each particular thread. This library fulfills the demands made by the UPC developers community and implements portable algorithms, independent of the specific UPC compiler/runtime being used. The use of a representative set of these extended collectives has been evaluated using two applications and four kernels as case studies. The results obtained confirm the suitability of the new library to provide easier programming without trading off performance, thus achieving high productivity in parallel programming to harness the performance of hybrid shared/distributed memory architectures in high performance computing. Unified Parallel C (UPC) is a parallel extension of ANSI C based on the Partitioned Global Address Space (PGAS) programming model, which provides a shared memory view that simplifies code development while it can take advantage of the scalability of distributed memory architectures. Therefore, UPC allows programmers to write parallel applications on hybrid shared/distributed memory architectures, such as multi-core clusters, in a more productive way, accessing remote memory by means of different high-level language constructs, such as assignments to shared variables or collective primitives. However, the standard UPC collectives library includes a reduced set of eight basic primitives with quite limited functionality. This work presents the design and implementation of extended UPC collective functions that overcome the limitations of the standard collectives library, allowing, for example, the use of a specific source and destination thread or defining the amount of data transferred by each particular thread. This library fulfills the demands made by the UPC developers community and implements portable algorithms, independent of the specific UPC compiler/runtime being used. The use of a representative set of these extended collectives has been evaluated using two applications and four kernels as case studies. The results obtained confirm the suitability of the new library to provide easier programming without trading off performance, thus achieving high productivity in parallel programming to harness the performance of hybrid shared/distributed memory architectures in high performance computing.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2013年第1期72-89,共18页 计算机科学技术学报(英文版)
基金 funded by Hewlett-Packard (Project "Improving UPC Usability and Performance in Constellation Systems:Implementation/Extensions of UPC Libraries") partially supported by the Ministry of Science and Innovation of Spain under Project No.TIN2010-16735 the Galician Government (Consolidation of Competitive Research Groups,Xunta de Galicia ref.2010/6)
关键词 Unified Parallel C collective operation PROGRAMMABILITY partitioned global address space high performance computing Unified Parallel C, collective operation, programmability, partitioned global address space, high performance computing
  • 相关文献

参考文献13

  • 1E1-Ghazawi T, Chauvin S. UPC benchnaarking issues. In Proc. the 30th Int. Conference on Parall,:l Processing, Sept. 2001, pp.365-372.
  • 2Taboada G L, Teijeiro C, Tourifio Jet al. Performance evalu- ation of unified parallel C collective communications. In Proc. the 11th IEEE lnt. Conf. High Performance Computing and Communications, Jun. 2009, pp.69-78.
  • 3Salama R A, Sameh A. Potential performance improvement of collective operations in UP . Advances in Parallel Com- puting, 2008, 15: 413-422.
  • 4Cantonnet F, Yao Y, Zahran M M et al. Productivity analy- sis of the UPC language. In Proc. the 18th Int. Parallel and Distributed Processing Symposium, Apr. 2004, pp.254.
  • 5Nishtala R, Alm:si G, Ca:caval C. Perforraance without pain = productivity: Data layout and collective communication in UPC. In Proc. the 13thACM SIGPLAN Symp. Principles and Practice of Parallel Proqramminq, Feb. 2008, pp.99-110.
  • 6Nishtala R, Zheng Y, Hargrove P, Yelick K. Tuning collec- tive communication for Partitioned Global Address Space pro- gramming models. Parallel Computing, 2011, 37(9): 576-591.
  • 7Bruck J, Ho C T, Kipnis S, Upfal E, VCeathersby D. Effi- cient Mgorithms for all-to-all communications in multiport message-passing systems. IEEE Transactions on Parallel and Distributed Systems, 1997, 8(11): 1143-1156.
  • 8Dinan J, Balaji P, Lusk E Let al. Hybrid parallel program- ming with MPI and unified parallel C. In Proc. the 7th Int. Conf. Computing Frontiers, May 2010, p1=,.177-186.
  • 9E1-Ghazawi T, Cantonnet F, Yao Y, Annaz:eddy S, Mohamed A S. Benchmarking parallel compilers: A UPC case study.Future Generation Computer Systems, 2006, 22(7): 764-775.
  • 10Malldn D A, Taboada G L, Teijeiro C, Tourifio J, Fraguela B B, Gdmez A, Doallo R, Mourifio J C. Performance evaluation of MPI, UPC and OpenMP on multicore architectures. In Proe. the 16th E:*ropearz PVM/MPI Users' Group Meeting, Sept. 2009, pp.174-184.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部