摘要
一些程序由于其迭代次数不足,或者基本块内向量并行的语句不够多,不足以为向量寄存器提供足够的并行,为进一步提升程序执行效率,对不充分向量化技术进行研究。研究向量寄存器的使用方式,并基于向量寄存器的不充分使用提出两种不充分向量化的实现方法,分析不同实现方法的性能收益。实验结果表明,该方法可以有效地对并行性不足的循环或者基本块进行向量化处理,提高程序执行效率。
Some programs are not able to provide sufficient parallelism to vector registers because the number of iterations or the vector parallel statements of the basic block is not enough. To improve the efficiency of program execution, the insufficient vectorization technology is studied. This paper researches on the vector register usage mode, and presents two insufficient vectorization methods based on the insufficient use of vector registers and analyses the performance benefits of different implementations. Finally, the experimental results show that this method can vectorize the loop or basic block with insufficient parallelism and improve the program execution efficiency.
作者
王琦
韩林
姚金阳
陶小涵
Wang Qi;Han Lin;Yao Jinyang;Tao Xiaohan(State Key Laboratory of Mathematical Engbwering and Advanced Computing,Zhengzhou 450001,Henan,China)
出处
《计算机应用与软件》
北大核心
2018年第9期108-112,共5页
Computer Applications and Software
基金
国家重点研发计划"高性能计算"重点专项基金项目(2016YFB0200503)