摘要
作为多媒体和科学计算等领域重要的程序加速器件之一,SIMD扩展部件现已广泛集成于各类处理器中。自动向量化方法是目前生成SIMD向量化程序的重要手段。超字并行SLP(superword level parallelism)方法现已广泛应用于编译器中,并成为实现基本块级代码向量化的主要手段。SLP在进行收益评估时仅考虑代码段整体向量化的收益,并没有考虑到向量化收益为负的片段会降低最终整体的向量化收益,从而导致SLP方法无法达到最好的向量化效果。基于此,提出了一种基于剪切的SLP向量化方法(throttling SLP,TSLP)。通过寻找最优的向量化子图,去除了向量化收益为负的代码段,从而可以获得更好的向量化效果。通过标准测试程序的实验结果表明,与原来的SLP方法相比,TSLP方法平均能够获得9%的性能提升。
SIMD vectors are widely adopted in modern general purpose processors as they can boost performance and energy efficiency for media and scientific applications.Compiler-based automatic vectorization is one approach for generating code that makes efficient use of the SIMD units.The SLP vectorization algorithm is the most well-known implementation of automatic vectorization.Choosing whether to vectorize is a one-off decision for the whole graph that has been generated.However,this is sub-optimal because the graph may contain code that is harmful to vectorization due to the need to move data from scalar registers into vectors.Therefore,this paper proposed a solution to overcome this limitation by introducing throttling SLP(TSLP),a novel vectorization algorithm that finds the optimal graph to vectorize.The decision did not consider the potential benefits of throttling the graph by removing this harmful code.The experiments show that TSLP can decrease execution time by 9%compared to SLP on average.
作者
李颖颖
奚慧兴
高伟
李伟
翟胜伟
Li Yingying;Xi Huixing;Gao Wei;Li Wei;Zhai Shengwei(Information Engineering University,Zhengzhou 450002,China;State Key Laboratory of Mathematical Engineering&Advanced Computing,Zhengzhou 450002,China;Anshan Normal University,Anshan Liaoning 114007,China;The 27th Research Institute,China Electronics Technology Group Corporation,Zhengzhou 450047,China)
出处
《计算机应用研究》
CSCD
北大核心
2018年第9期2578-2582,共5页
Application Research of Computers
基金
国家自然科学基金资助项目(61472447)
国家"863"计划资助项目(2014AA01A300)
国家"核高基"重大专项资助项目(2013ZX0102-8001-001-001)