基于数据局部性的循环分块选择算法

Tile Selection Algorithm Based on Data Locality

下载PDF

导出

摘要现有的多面体编译框架(如Pluto,LLVM/Polly和GCC/Graphite)在进行循环分块时,都采用了固定分块大小,无法充分发挥不同硬件的缓存特性,导致存在较大的性能差异。针对这一问题,涌现了许多基于多级缓存和数据局部性的循环分块算法,但这些算法往往只能优化特定循环程序或者缺乏综合考虑,不适合移植到通用编译器中。文中提出了一种基于数据局部性的循环分块选择算法,该算法不仅考虑了缓存替换策略的影响,还考虑了多核环境下的负载均衡问题。算法基于LLVM中的Polly模块实现,并选用Pluto和PolyBench中的部分测试用例进行单核和多核测试。实验结果表明,单核环境下,相比LLVM/Polly的默认分块方法,该算法在两种硬件平台下分别获得了平均2.03和2.05的加速比,且在多核环境下具有良好的并行可扩展性。 The existing polyhedral compilation frameworks(such as Pluto,LLVM/Poly and GCC/Graphite)use fixed block sizes when performing loop tiling,which cannot fully utilize the caching characteristics of different hardware,resulting in significant performance differences.In response to this issue,many loop tiling algorithms based on multi-level caching and data locality have emerged,but these algorithms often only optimize specific loop programs or lack comprehensive consideration,and are not suitable for transplantation into general compilers.This paper proposes a tile size selection algorithm based on data locality,which not only considers the impact of cache replacement strategy,but also considers the load balancing problem in multi-core environments.The algorithm is implemented based on the Polly module in LLVM,and some test cases from Pluto and PolyBench are selected for single core and multi-core testing.The experimental results show that compared to the default partitioning method of LLVM/Polly,the proposed algorithm achieves an average acceleration ratio of 2.03 and 2.05 on two hardware platforms in a single core environment,and has good parallel scalability in a multi-core environment.

作者廖启华聂凯韩林陈梦尧谢汶兵 LIAO Qihua;NIE Kai;HAN Lin;CHEN Mengyao;XIE Wenbing(School of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou 450000,China;National Supercomputing Center in Zhengzhou,Zhengzhou University,Zhengzhou 450000,China;Wuxi Advanced Technology Research Institute,Wuxi,Jiangsu 214000,China)

机构地区郑州大学计算机与人工智能学院郑州大学国家超级计算郑州中心无锡先进技术研究院

出处《计算机科学》 CSCD 北大核心 2024年第12期100-109,共10页 Computer Science

基金 2022年河南省重大科技专项(221100210600) 2022求是科研启动(自)(32213247)。

关键词数据局部性多面体模型循环分块分块大小负载均衡 Data locality Polyhedral model Loop tiling Tile size Load balancing

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献5

1刘松,伍卫国,赵博,蒋庆.面向局部性和并行优化的循环分块技术[J].计算机研究与发展,2015,52(5):1160-1176. 被引量：11
2屈彬,刘松,张增源,马洁,伍卫国.一种六边形循环分块的Jacobi计算优化方法[J].软件学报,2024,35(8):3721-3738. 被引量：1
3包怡坤,张鹏,徐小文,莫则尧.基于神经网络模型的stencil循环最优分块大小预测[J].计算机科学,2022,49(10):18-26. 被引量：1
4朱雨,庞建民,徐金龙,陶小涵,王军.面向SW26010处理器的三维Stencil自适应分块参数算法[J].计算机科学,2021,48(6):10-18. 被引量：3
5柴晓菲,刘松,屈彬,王倩,伍卫国.向量化友好的循环分块因子选择算法[J].计算机工程与应用,2020,56(15):37-42. 被引量：1

二级参考文献97

1Owens J D, Luebke D, Govindaraju N, et al. A survey of general-purpose computation on graphics hardware [J]. Computer Graphics Forum, 2007, 26(1) : 80-113.
2Grosser T, Cohen A, Kelly P, et al. Split tiling for GPUs: Automatic parallelization using trapezoidal tiles [C]//Proc of the 6th Workshop on General Purpose Processor Using Graphics Processing Units. New York: ACM, 2013: 24-31.
3Kaspersky K. Code Optimization: Effective Memory Usage [M]. New Delhi, India: BPB Publications, 2004.
4Baghdadi R, Cohen A, Verdoolaege S, et al. Improved loop tiling based on the removal of spurious false dependences [J]. ACM Trans on Architecture and Code Optimization(TACO) Special Issue on High-Performance Embedded Architectures and Compilers, 2013, 9(4): 1-26.
5Pouchet L N, Bondhugula U, Bastoul C, et al. Loop transformations: Convexity, pruning and optimization [C // Proc of the 38th ACM SIGPLAN-SIGACT Symp on Principles of Programming Languages (POPL'll). New York: ACM, 2011:549-562.
6Lain M S, Wolf M E. A data locality optimizing algorithm [C] //Proc of the 12th ACM SIGPLAN Conf on Programming LangUage Design and Implementation (PLDI'91). NewYork: ACM, 1991:30-44.
7Lain M D, Rothberg E, Wolf M E. The cache performance and optimizations of blocked algorithms [C] //Proc of the 4th Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 1991: 63-74.
8Irigoin F, Triolet R. Supernode partitioning [C] //Proc of the 15th ACM SIGPLAN-SIGACT Syrup on Principles of Programming Languages ( POPL'88 ). New York: ACM, 1988:319-328.
9Ancourt C, Irigoin F. Scanning polyhedra with DO loops [C] //Proc of the 3rd ACM SIGPLAN Syrup on Principles and Practice of Parallel Programming. New York: ACM, 1991: 39-50.
10Xue Jingling. Loop Tiling for Parallelism [M]. Amsterdam, Netherlands: Kluwer Academic Publishers, 2000.

共引文献11

1刘松,赵博,蒋庆,伍卫国.一种面向循环优化和非规则代码段的粗粒度半自动并行化方法[J].计算机学报,2017,40(9):2127-2147. 被引量：4
2骆亮.多核平台两级抢占式固定优先级DAG递归调度[J].微电子学与计算机,2020,37(4):70-75. 被引量：2
3薛亚非,冯钧.基于时隙堆栈搜索的异构集群DAG调度策略[J].计算机工程与设计,2020,41(6):1725-1732.
4柴晓菲,刘松,屈彬,王倩,伍卫国.向量化友好的循环分块因子选择算法[J].计算机工程与应用,2020,56(15):37-42. 被引量：1
5池昊宇,陈长波.基于神经网络的循环分块大小预测[J].计算机科学,2020,47(8):62-70. 被引量：7
6陈莹,黄永彪,谢瑾.基于可靠性的多核系统硬实时任务并行调度[J].控制工程,2021,28(1):176-182. 被引量：4
7包怡坤,张鹏,徐小文,莫则尧.基于神经网络模型的stencil循环最优分块大小预测[J].计算机科学,2022,49(10):18-26. 被引量：1
8李明亮,庞建民,岳峰.一种面向申威26010处理器的分布式传递锁机制[J].计算机科学,2022,49(10):52-58.
9彭畅,刘青枝,陈长波.多面体模型下的循环置换与自动调优[J].计算机工程与科学,2023,45(12):2121-2134. 被引量：1
10彭畅,陈长波.基于机器学习的多面体模型下的循环置换[J].信息技术,2023,47(12):22-32.

1何昊天,周蓓,郭绍忠,张作言,郝江伟,许瑾晨.基于多面体模型的矩阵乘法自动混合精度优化[J].计算机科学,2024,51(12):110-119.
2王懋譞,李振国,吴撼明,邵元凯.基于模糊模型挖掘的NOx排放MAP图标定方法研究[J].内燃机与配件,2024(19):1-4.
3朱超.Numba下自适应双阈值的Canny边缘检测并行算法[J].电脑知识与技术,2024,20(31):34-39.
4李波,高骁.数据仓库驱动的多维度值勤数据统计分析方法[J].电脑知识与技术,2024,20(32):48-51.
5Yalan Yin,Qi Wang,Jianrui Li,Chunliu He.Theoretical Analysis and Design Implementation of FM Broadcast Receiving System based on SDR[J].Journal of Electronic Research and Application,2024,8(6):118-123.
6任燕燕,喻良,贾翰烜,周怀春.基于Wiener-BP的炉膛测温仿真实验设计[J].实验室研究与探索,2024,43(11):101-107.
7于红秀,殷凤轩.GSO对联合国可持续发展目标SDG的贡献及相关分析[J].汽车与配件,2024(22):28-31.
8梁杰,郑家瑜,陈哲毅,于正欣,苗旺.基于联邦深度学习的多边缘协作缓存方法[J].小型微型计算机系统,2024,45(12):2994-3001.
9Brett Tempest,Janos Gergely,C.Weggel.ENGINEERING CHARACTERIZATION OF STRENGTH AND ELASTIC PROPERTIES OF GEOPOLYMER CEMENT CONCRETE MATERIALS[J].Journal of Green Building,2015,10(4):89-106.
10茅晓晨,程昊,罗哲,王何建,刘波.轴流压气机串列叶栅前后叶型位置匹配特性研究[J].燃气涡轮试验与研究,2024,37(4):1-10.

计算机科学

2024年第12期

浏览历史

内容加载中请稍等...

基于数据局部性的循环分块选择算法

参考文献5

二级参考文献97

共引文献11

相关作者

相关机构

相关主题

浏览历史