摘要
为了解决算法程序自动映射到可重构媒体处理器的问题,有效提高程序并行执行的效率,提出一种具有自动并行化的任务编译前端.该任务编译前端通过展开核心循环可提高并行执行度,在数据依赖分析确保运算正确执行的基础上,对循环体内的数组访问进行标量替换,以优化数据传输开销.实验结果表明,该任务编译前端能有效提高代码并行性和优化数据传输能力,与Garp C编译器的编译前端相比,该任务编译前端设计的性能可提升约2~4倍.
In order to automatically map algorithms onto reconfigurable multimedia processor and improve the parallel efficiency of algorithms, a task compiler front-end is designed. The kernel loop unrolling is introduced to improve the degree of parallelism; and the scalar replacement technique based on data dependence analysis is used to optimize the cost of data transmission. Experiments show that the task compiler front-end improves the degree of parallelism effectively, its performance can be, compared with the front-end of Garp C compiler, sharply increased up to 2 -4 times in the whole system.
出处
《北京邮电大学学报》
EI
CAS
CSCD
北大核心
2011年第3期108-112,126,共6页
Journal of Beijing University of Posts and Telecommunications
基金
国家高技术研究发展计划项目(2009AA011702)
国家自然科学基金项目(60803018)
关键词
可重构计算
任务编译器
循环展开
标量替换
reconfigurable computing
task compiler
loop unrolling
scalar replacement