期刊文献+

自动映射多循环程序到有限FPGA资源的参数化流水线模板 被引量:2

Pipelined Template for Mapping Multiple Loop Nests on FPGA with Restricted Resources
下载PDF
导出
摘要 FPGA为加速计算密集型应用提供了一个灵活高效的平台.然而,由于片上资源有限,在一些情况下,需要将大规模应用中包括的多个循环程序分别映射到FPGA上执行,当一个循环程序执行完毕后,需要重新配置FPGA以执行下一个循环程序,FPGA重构过程在整个程序执行过程中占用了较多时间.文中设计了一个参数化流水线模板,并提出了相应的指令分配调度策略,实现了自动将多循环程序顺序映射到目标FPGA片上系统,同时在程序切换时,不需要进行FPGA重构.实验结果表明,对每个循环程序,文中设计的流水线模板能达到与专用硬件结构相当的执行节拍,同时节约了程序切换时的重构时间. FPGA provides a convenient and flexible solution to speed up loop-intensive algorithms. However, these loop nests in a large scale application have to be mapped onto the target FPGA orderly because of the limited resources on-chip. FPGA reconfiguration which needs a long time is inevitable when switching between the loop nests. This paper presents a pipelined template and instruction schedule method corresponding to execute all the loop nests in sequence without FPGA reconfiguration. Experiments show that the pipelined template can achieve a comparative execution cycles for a loop comparing with the special hardware and without the need of FPGA reconfiguration.
出处 《计算机学报》 EI CSCD 北大核心 2009年第1期152-160,共9页 Chinese Journal of Computers
基金 国家自然科学基金重点项目(60633050) 91655部队青年科研基金资助~~
关键词 循环 FPGA 流水线模板 指令调度 loop FPGA pipelined template instruction schedule
  • 相关文献

参考文献13

  • 1So B, Hall M, Diniz P. A compiler approach to fast design space exploration in FPGA-based systems//Proceedings of the ACM Conference Programming Languages Design and Implementation. Berlin, Germany, 2002:165-176
  • 2Ziegler Heidi E, Hall Mary W. Evaluating heuristics in automatically mapping multi-Loop applications to FPGAs//Proceedings of the 13th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. Monterey, California, 2005 : 184-195
  • 3Abu-Sufah W A, Kuck D J, Lawrie D H. On the performance enhancement of paging systems through program analysis and transformations. IEEE Transactions on Computers, 1981, 30(5): 341-356
  • 4Allen J, Kennedy K. Automatic loop interchange//Proceedings of the 1984 SIGPLAN Symposium on Compiler Construction. Montreal, Canada, 1984:233-246
  • 5Lain M. Software pipelining.. An effective scheduling technique for VLIW machines//Proceedings of the the SIGPLAN 88 Conference on Programming Language Design and Implementation (PLDI). Atlanta, Georgia, ACM SIGPLAN Noices, 1988:318-328
  • 6Moon S M, Ebciogu K. An efficient resource-constrained global scheduling technique for superscalar and VLIW prcessors//Proeeedings of the 25th Annual International Symposium on Mieroarehitecture. Portland, Oregon, 1992:55-71
  • 7Aiken A, Nicolau A. A realistic resource-constrained software pipelining algorithm//Advances in Languages and Compilers for Parallel Processing. London: Pitman/The MIT Press, 1991~ 274-290
  • 8Ramakrishna Rau B. Iterative modulo scheduling: An algorithm for software pipelining loops//Proeeedings of the ACM SIGMICRO Newsletter. San Jose, CA, USA, 1994:63-74
  • 9Aditya Shail, Schlansker Michael S. ShiftQ: A buffered interconnect for custom loop accelerators//Proceedings of the 2001 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. Atlanta, Georgia, USA, 2001:158-167
  • 10Kathail Vinod, Aditya Shail, Schreiber Robert, Ramakrishna Rau B, Cronquist Darren C, Sivaraman Mukund. PICO: Automatically designing custom computers. IEEE Computer, 2002, 35(9) : 39-47

二级参考文献10

  • 1Allan VH, Jones RB, Lee RM, Allan SJ. Software pipelining. ACM Computing Surveys, 1995,27(3):367-432.
  • 2Rau BR. Iterative modulo scheduling: An algorithm for software pipelining loops. In: Proc. of the 27th Annual Int'l Symp. on Microarchitecture. New York: ACM Press, 1994.63-74.
  • 3Callahan D, Kennedy K, Porterfield A. Software prefetching. In: Proc. of the 4th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems. New York: ACM Press, 1991.40-52.
  • 4Ju RDC, Nomura K, Mahadevan U, Wu LC. A unified compiler framework for control and data speculation. In: Hurson AR, ed.Proc. of the 2000 Int'l Conf. on Parallel Architecture and Compilation Techniques. IEEE Press, 2000. 157-168.
  • 5Sanchez FJ, Gonzalez A. Cache sensitive modulo scheduling. In: Proc. of the 30th Annual IEEE/ACM Int'l Symp. on Microarchitecture. IEEE Press, 1997. 338-348.
  • 6Doshi G, Krishnaiyer R, Muthukumar K. Optimizing software data prefetches with rotating registers. In: Hurson AR, ed. Proc. of the 2001 Int'l Conf. on Parallel Architecture and Compilation Techniques. IEEE Press, 2001. 257-267.
  • 7Collard JF, Lavery D. Optimizations to prevent cache penalties for the Intel(R) Itanium(R) 2 processor. In: Int'l Symp. on Code Generation and Optimization. 2003. 105-114.
  • 8Huff RA. Lifetime-Sensitive modulo scheduling. In: Budd TA, ed. Proc. of the ACM SIGPLAN'93 Conf. on Programming Language Design and Implementation. New York: ACM Press, 1993. 258-267.
  • 9Roy J, Sun C, Wu CY. Tutorial: Open research compiler for Itanium processor family (IPF). In: Proc. of the 34th Annual Int'l Symp. on Microarchitecture. New York: ACM Press, 2001.
  • 10Intel Corp. Intel@ Itanium@ 2 Processor Reference Manual For Software Development and Optimization. Intel Corporation, 2004.

共引文献5

同被引文献5

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部