自动映射多循环程序到有限FPGA资源的参数化流水线模板被引量：2

Pipelined Template for Mapping Multiple Loop Nests on FPGA with Restricted Resources

下载PDF

导出

摘要 FPGA为加速计算密集型应用提供了一个灵活高效的平台.然而,由于片上资源有限,在一些情况下,需要将大规模应用中包括的多个循环程序分别映射到FPGA上执行,当一个循环程序执行完毕后,需要重新配置FPGA以执行下一个循环程序,FPGA重构过程在整个程序执行过程中占用了较多时间.文中设计了一个参数化流水线模板,并提出了相应的指令分配调度策略,实现了自动将多循环程序顺序映射到目标FPGA片上系统,同时在程序切换时,不需要进行FPGA重构.实验结果表明,对每个循环程序,文中设计的流水线模板能达到与专用硬件结构相当的执行节拍,同时节约了程序切换时的重构时间. FPGA provides a convenient and flexible solution to speed up loop-intensive algorithms. However, these loop nests in a large scale application have to be mapped onto the target FPGA orderly because of the limited resources on-chip. FPGA reconfiguration which needs a long time is inevitable when switching between the loop nests. This paper presents a pipelined template and instruction schedule method corresponding to execute all the loop nests in sequence without FPGA reconfiguration. Experiments show that the pipelined template can achieve a comparative execution cycles for a loop comparing with the special hardware and without the need of FPGA reconfiguration.

作者董亚卓窦勇宋健刘明政

机构地区中国人民解放军国防科学技术大学计算机学院中国人民解放军总后后勤科学研究所应用室

出处《计算机学报》 EI CSCD 北大核心 2009年第1期152-160,共9页 Chinese Journal of Computers

基金国家自然科学基金重点项目(60633050) 91655部队青年科研基金资助~~

关键词循环 FPGA 流水线模板指令调度 loop FPGA pipelined template instruction schedule

分类号 TP302 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献13

1So B, Hall M, Diniz P. A compiler approach to fast design space exploration in FPGA-based systems//Proceedings of the ACM Conference Programming Languages Design and Implementation. Berlin, Germany, 2002:165-176
2Ziegler Heidi E, Hall Mary W. Evaluating heuristics in automatically mapping multi-Loop applications to FPGAs//Proceedings of the 13th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. Monterey, California, 2005 : 184-195
3Abu-Sufah W A, Kuck D J, Lawrie D H. On the performance enhancement of paging systems through program analysis and transformations. IEEE Transactions on Computers, 1981, 30(5): 341-356
4Allen J, Kennedy K. Automatic loop interchange//Proceedings of the 1984 SIGPLAN Symposium on Compiler Construction. Montreal, Canada, 1984:233-246
5Lain M. Software pipelining.. An effective scheduling technique for VLIW machines//Proceedings of the the SIGPLAN 88 Conference on Programming Language Design and Implementation (PLDI). Atlanta, Georgia, ACM SIGPLAN Noices, 1988:318-328
6Moon S M, Ebciogu K. An efficient resource-constrained global scheduling technique for superscalar and VLIW prcessors//Proeeedings of the 25th Annual International Symposium on Mieroarehitecture. Portland, Oregon, 1992:55-71
7Aiken A, Nicolau A. A realistic resource-constrained software pipelining algorithm//Advances in Languages and Compilers for Parallel Processing. London: Pitman/The MIT Press, 1991~ 274-290
8Ramakrishna Rau B. Iterative modulo scheduling: An algorithm for software pipelining loops//Proeeedings of the ACM SIGMICRO Newsletter. San Jose, CA, USA, 1994:63-74
9Aditya Shail, Schlansker Michael S. ShiftQ: A buffered interconnect for custom loop accelerators//Proceedings of the 2001 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. Atlanta, Georgia, USA, 2001:158-167
10Kathail Vinod, Aditya Shail, Schreiber Robert, Ramakrishna Rau B, Cronquist Darren C, Sivaraman Mukund. PICO: Automatically designing custom computers. IEEE Computer, 2002, 35(9) : 39-47

二级参考文献10

1Allan VH, Jones RB, Lee RM, Allan SJ. Software pipelining. ACM Computing Surveys, 1995,27(3):367-432.
2Rau BR. Iterative modulo scheduling: An algorithm for software pipelining loops. In: Proc. of the 27th Annual Int'l Symp. on Microarchitecture. New York: ACM Press, 1994.63-74.
3Callahan D, Kennedy K, Porterfield A. Software prefetching. In: Proc. of the 4th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems. New York: ACM Press, 1991.40-52.
4Ju RDC, Nomura K, Mahadevan U, Wu LC. A unified compiler framework for control and data speculation. In: Hurson AR, ed.Proc. of the 2000 Int'l Conf. on Parallel Architecture and Compilation Techniques. IEEE Press, 2000. 157-168.
5Sanchez FJ, Gonzalez A. Cache sensitive modulo scheduling. In: Proc. of the 30th Annual IEEE/ACM Int'l Symp. on Microarchitecture. IEEE Press, 1997. 338-348.
6Doshi G, Krishnaiyer R, Muthukumar K. Optimizing software data prefetches with rotating registers. In: Hurson AR, ed. Proc. of the 2001 Int'l Conf. on Parallel Architecture and Compilation Techniques. IEEE Press, 2001. 257-267.
7Collard JF, Lavery D. Optimizations to prevent cache penalties for the Intel(R) Itanium(R) 2 processor. In: Int'l Symp. on Code Generation and Optimization. 2003. 105-114.
8Huff RA. Lifetime-Sensitive modulo scheduling. In: Budd TA, ed. Proc. of the ACM SIGPLAN'93 Conf. on Programming Language Design and Implementation. New York: ACM Press, 1993. 258-267.
9Roy J, Sun C, Wu CY. Tutorial: Open research compiler for Itanium processor family (IPF). In: Proc. of the 34th Annual Int'l Symp. on Microarchitecture. New York: ACM Press, 2001.
10Intel Corp. Intel@ Itanium@ 2 Processor Reference Manual For Software Development and Optimization. Intel Corporation, 2004.

共引文献5

1刘利,李文龙,郭振宇,李胜梅,汤志忠.避免模调度中cache代价的优化方法[J].软件学报,2005,16(10):1842-1852. 被引量：1
2周谦,冯晓兵,张兆庆.cache profiling信息指导的软件流水[J].计算机研究与发展,2008,45(5):834-840. 被引量：1
3刘利,陈彧,乔林,汤志忠.利用循环分割和循环展开避免Cache代价[J].软件学报,2008,19(9):2228-2242. 被引量：2
4谭明星,刘先华,张吉豫,程旭.基于优化回溯模型的无重叠模调度算法[J].电子学报,2012,40(8):1681-1686.
5盛腾飞,卢宏生,曹志强,王梦嘉,斯添浩.高性能计算系统RDMA Read机制研究[J].计算机工程,2018,44(10):69-79. 被引量：1

同被引文献5

1赵丽娜,侯义斌,黄樟钦,高曦,李倩.基于FPGA的嵌入式语音识别控制系统[J].小型微型计算机系统,2007,28(8):1527-1531. 被引量：8
2郭立,王妙锋,刘璐,郁理,李琳.1.6Kb/s类MELP语音压缩编码器的FPGA实现[J].小型微型计算机系统,2008,29(8):1553-1556. 被引量：4
3王成山,丁承第,李鹏,于浩.基于FPGA的配电网暂态实时仿真研究(一):功能模块实现[J].中国电机工程学报,2014,34(1):161-167. 被引量：29
4党宏社,王黎,王晓倩.基于Vivado HLS的FPGA开发与应用研究[J].陕西科技大学学报（自然科学版）,2015,35(1):155-159. 被引量：30
5张展,崔晋伟,陆炯.基于Xilinx Vivado HLS的小型无人机平衡仪设计[J].电子科技,2015,28(7):172-174. 被引量：6

引证文献2

1张国印,刘铭,姚爱红.一种基于SoPC技术的iLBC语音编解码器设计实现[J].小型微型计算机系统,2010,31(5):1016-1020. 被引量：1
2张浩盛,贺光辉,李龙,王谦.基于HLS实现的小步长电磁暂态仿真系统[J].信息技术,2017,41(10):38-42. 被引量：1

二级引证文献2

1熊海翡,贺光辉.基于随机计算的大规模MIMO检测算法研究与硬件实现[J].微电子学与计算机,2020,37(7):36-41. 被引量：1
2韩建,田博文,王春龙.基于FPGA的CAN总线网络故障检测修复系统研究[J].化工自动化及仪表,2014,41(9):1040-1042. 被引量：2

1袁军鹏,朱东华.基于Apriori算法的多循环关联规则挖掘综述[J].计算机科学,2004,31(1):114-117. 被引量：6
2王兆坤.利用程序切换输入状态[J].新浪潮,1993(2):59-59.
3王书琴.在Office中快速切换[J].家庭电脑世界,2002(7):114-114.
4王会珍.基于Linux的PC集群[J].赣南师范学院学报,2000(2):71-73.
5舒楠.Windows下中西文输入法的程序切换[J].郑州铁路职业技术学院学报,2000,12(3):60-62.
6杨光友,潘武.可重构的高速以太网数据采集平台设计[J].机床与液压,2009,37(8):282-285. 被引量：1
7袁路路,张娓娓.智能轨迹控制割草机器人设计——基于FPGA神经网络[J].农机化研究,2017,39(4):212-216. 被引量：14
8毅力.在．NET中轻松玩转输入法[J].电脑知识与技术（认证考试）,2004(02M):44-45.
9俞木发.老鼠听我话[J].玩电脑,2005(2):24-25.
10移花接木帮助诺基亚手机解放红白键[J].电脑爱好者,2011(15):79-79.

计算机学报

2009年第1期

浏览历史

内容加载中请稍等...

自动映射多循环程序到有限FPGA资源的参数化流水线模板被引量：2

参考文献13

二级参考文献10

共引文献5

同被引文献5

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

自动映射多循环程序到有限FPGA资源的参数化流水线模板 被引量：2

参考文献13

二级参考文献10

共引文献5

同被引文献5

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

自动映射多循环程序到有限FPGA资源的参数化流水线模板被引量：2