期刊文献+

面向多核集群的数据流程序层次流水线并行优化方法 被引量:8

Multi-Level Pipelining Parallelism for Dataflow Programs on Multi-Core Cluster
下载PDF
导出
摘要 数据流编程语言是一种面向领域的编程语言,它能够将计算与通信分离,暴露应用程序的并行性.多核集群中计算、存储和通信等底层资源的复杂性对数据流程序的性能提出了新的挑战.针对数据流程序在多核集群上执行存在资源利用低和扩展性差等问题,利用同步数据流图作为中间表示,文中提出并实现了面向多核集群的层次性流水线并行优化方法.方法包含任务划分与调度、层次流水线调度和数据局部性优化,经过编译优化后生成基于MPI的可并行执行的目标代码.其中任务划分与调度是利用程序中数据和任务并行性将任务映射到计算核上,实现负载均衡和低通信同步开销;层次性流水线调度是利用程序中的并行性构造低延迟流水线调度;数据局部性优化是针对数据访问存在的Cache伪共享做面向存储的优化.实验以X86架构多核处理器组成的集群为平台,选取媒体处理领域的典型应用算法作为测试程序,对层次流水线优化进行实验分析.实验结果表明了优化方法的有效性. As a domain specific programming model,data flow programming combines the features of media applications and programming languages and offers an attractive way to express the parallelism.However,the complexity of underlying computation,storage and communication in the cluster systems puts forward new challenge to the performance of data flow application.For the problems of current data flow programming,the compiler translates the code to the data flow graph as a middle representation.The paper proposed an efficient data flow compilation framework,namely multi-level pipelining parallelism optimization framework,for cluster architecture to optimize the execution of data flow applications.The framework is composed of three optimization phases:(1)task partitioning and scheduling,which maps a data flow graph to agiven cluster for loading balance and low communication cost,(2)multi-level pipelining scheduling,which constructs a low communication and synchronization cost pipeline scheduling for data flow programs,and(3)data locality aware optimization,which judiciously repeats actor executions to eliminate false sharing and improve locality.We choose multi-core cluster as the experimentplatform and the common algorithms in media processing applications as benchmarks and evaluated the performance of multi-level pipelining parallelism.Our experiments show that its scalability and performance are good.
出处 《计算机学报》 EI CSCD 北大核心 2014年第10期2071-2083,共13页 Chinese Journal of Computers
基金 国家"八六三"高技术研究发展计划重点项目基金(2012AA010902) 高等学校博士学科点专项科研基金(20120142110089)资助
关键词 多核集群 数据流编程 编译 流水线 COStream multi-core cluster data flow programs compilation pipeline COStream
  • 相关文献

参考文献22

  • 1Taylor M,Kim J,Miller J,et al.The raw microprocessor:A computational fabric for software circuits and general purpose programs.IEEE Micro,2002,22(2):25-35.
  • 2Tan G,Fan D,Zhang J,et al.Experience on optimizing irregular computation for memory hierarchy in manycore architecture//Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Program ming.New York,USA,2008:279-280.
  • 3Khailany B,Dally W,Kapasi U,et al.Imagine:Media processing with streams.IEEE Micro,2001,21(2):35-46.
  • 4Dally W,Labonte F,Das A,et al.Merrimac:Supercomputing with streams//Proceedings of the 2003 ACM/IEEE Conference on Supercomputing.New York,USA,2003:35-42.
  • 5Hofstee H.Power efficient processor design and the cell processor//Proceedings of the 11th International Symposium on High-Performance Computer Architecture.San Francisco,USA,2005:258-262.
  • 6Thies W,Karczmarek M,Amarasinghe S.StreamIt:A language for streaming applications//Proceedings of the 11th Compiler Construct.London,UK,2002:179-196.
  • 7Mark W,Glanville R,Akeley K,et al.Cg:A system for programming graphics hardware in a C-like language.ACM Transactions on Graphics,2003,22(3):896-907.
  • 8张维维,魏海涛,于俊清,李鹤,黎昊,杨秋吉.COStream:一种面向数据流的编程语言和编译器实现[J].计算机学报,2013,36(10):1993-2006. 被引量:10
  • 9Gordon M,Thies W,Amarasinghe S.Exploiting coarse grained task,data,and pipeline parallelism in stream programs//Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems.New York,USA,2006:151-162.
  • 10Berger S,Stamatakis A.Assessment of barrier implementa tions for fine grain parallel regions on current multi-core architectures//Proceedings of the 2010 IEEE International Conference on Cluster Computing.Heraklion,Crete,2010:1-8.

二级参考文献24

  • 1Gordon M I, Thies W, Karczmarek M, et al. A stream com- piler for communication-exposed architectures//Proceedings of the 10th International Conference of Architectural Support for Programming Languages and Operating Systems. New York, NY, USA, 2002: 291-303.
  • 2Wei Hai-Tao, Qin Ming-Kang, Zhang Wei Wei, et al. Stre amTMC Stream compilation for tiled multi core arehitec tures. Journal of Parallel and Distributed Computing, 2013 73(4) :484-494.
  • 3Dally W, Labonte F, Das A, et al. Merrimac: Supercomput ing with Streams//Proceedings of the ACM/IEEE Confer ence on Supercomputing. New York, NY, USA, 2003 35-42.
  • 4Hofstee H. Power efficient processor architecture and the Cell processor//Proceedings of the 11th International Symposium on High-Performance Computer Architecture. Washington, DC, USA, 2005: 258-262.
  • 5Thies W, Karczmarek M, Amarasinghe S. StreamIt: A language for streaming applications//Proceedings of the llth International Conference on Compiler Construction. London, UK, 2002:179-196.
  • 6Buck I, Foley T, Horn D, et al. Brook for GPUs: Stream computing on graphics hardware. ACM Transactions on Graphics, 2004, 23(3): 777 -786.
  • 7Mark W, Steven R, Kurt G, et al. Cg: A system for programming graphics hardware in a C-like language. ACM Transactions on Graphics, 2003, 22(3): 893-907.
  • 8Wei Hai-Tao, Yu Jun-Qing, Yu Hua-Fei, et al. Minimizing communication in rate optimal software pipelining for stream programs//Proceedings of the 8th Annual IEEE/ACM Inter national Symposium on Code Generation and Optimization. New York, NY, USA, 2010 : 210-217.
  • 9Mernik M, Heering J, Sloane A. When and how to develop domain-specific languages. ACM Computing Surveys, 2005, 37(4) : 316-344.
  • 10ETI. SWARM: Scalable performance optimization for multi core/multi-node. White paper, 2011.

共引文献9

同被引文献43

引证文献8

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部