期刊文献+

应用驱动的并行程序性能优化研究

Research on Application-driven Parallel Program Performance Tuning
下载PDF
导出
摘要 从应用角度出发,分析、归纳各种应用中的核心计算过程,利用符合多核处理器芯片架构的并行计算模型对这些核心计算过程进行优化,得出可以被重复利用的高性能可扩展的软件库,它既可以支持新应用的高效开发,也可以保证程序性能的可扩展性。以分层并行计算模型思想为指导,从应用驱动的并行程序性能优化的角度出发,首先提出了面向多核处理器芯片体系结构的并行算法设计模型,在此基础上对并行扫描算法进行分析优化,得出新的具有良好扩展性、高性能的g-scan算法。之后深入研究13种核心计算实体之一的稀疏线性代数计算实体,应用g-scan算法设计实现了新的稀疏矩阵-向量运算算法,并将其应用于结构工程领域中广泛使用的有限元分析,大大提升了其执行效率。 Multi-core processor provides multiple threads parallel execution capability,and makes applications to have huge potential for performance improvement,but makes it enormously challenge to efficiently develop high-performance program.Meanwhile,through the old process of performance optimization,the scalability is difficult to be guarantee.From application point of view,attributing core calculation to patterns and motifs,and optimizing these motifs can produce reusable library that can support efficient develop new application,and also can guarantee the scalability of the application performance.A layered parallel computing model was used for guidance in this article.From the perspective of application-driven parallel program performance optimization,this article designed a new parallel multi-core processor computing model,which can be used in the architecture of multi-core processor chip.Based on this model,g-scan algorithm was designed which has good extendibility together with high performance after analyzing and optimizing some fundamental parallel algorithms.At the last,newly designed parallel algorithm was applied to OpenSeesSP which is a finite element software widely used in structure engineering.
出处 《计算机科学》 CSCD 北大核心 2013年第1期49-53,共5页 Computer Science
基金 中国教育科研网格二期建设项目(ChinaGrid 2)资助
关键词 多核处理器 计算实体 扫描算法 有限元分析 Multi-core processor Computing motif Scan algorithm Finite element method
  • 相关文献

参考文献13

  • 1Agarwal A. The Why,Where and How of Multicore[A].2006.96-100.
  • 2Asanovic K,Bodik R,Catanzaro B C. The Landscape of Parallel Computing Research:A View from Berkeley[UCB/EECS-2006-183][R].EECS Department,University of California,Berkeley,2006.133-138.
  • 3Dubey P. A platform 2015 workload model:Recognition,mining and synthesis moves computers to the era of tera[R].Intel Corporation,2005.99-102.
  • 4Sutter H. A fundamental turn toward concurrency in software[J].Dr Dobb's Journal,2005,(03):16-22.
  • 5Gerber R,Bik A J C,Smith K B. The Software Optimization Cookbook:High-performance Recipes for IA-32 Platforms[M].Intel Press,2006.
  • 6Colella P. Defining Software Requirements for Scientific Computing (presentation)[M].2004.28-49.
  • 7陈国良,苗乾坤,孙广中,徐云,郑启龙.分层并行计算模型[J].中国科学技术大学学报,2008,38(7):841-847. 被引量:9
  • 8Hillis W D,Steele G L Jr. Data Parallel Algorithms[J].Communications of the ACM,1986,(12):1170-1183.
  • 9Blelloch G E. Scans as Primitive Parallel Operations[J].IEEE Transaction on Computer,1989,(11):1526-1538.
  • 10Blelloch G E. NESL:A Nested Data-Parallel Language (Version 2.6)[CMU-CS-93-129][R].School of Computer Science,Carnegie Mellon Univ,1993.338-349.

二级参考文献4

共引文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部