通过部分页迁移实现CPU-GPU高效透明的数据通信

Efficient and transparent CPU-GPU data communication through partial page migration

下载PDF

导出

摘要尽管对集成GPU和下一代互连的研究投入日益增加,但由PCI Express连接的独立GPU仍占据市场的主导地位,CPU和GPU之间的数据通信管理仍在不断发展。最初,程序员显式控制CPU和GPU之间的数据传输。为了简化编程,GPU供应商开发了一种编程模型,为“CPU+GPU”异构系统提供单个虚拟地址空间。此模型中的页迁移机制会自动根据需要在CPU和GPU之间迁移页面。为了满足高性能工作负载的需求,页面大小有增大趋势。受低带宽和高延迟互连的限制,较大的页面迁移延迟时间较长,这可能会影响计算和传输的重叠并导致严重的性能下降。提出了部分页迁移机制,它只迁移页面的所需部分,以缩短迁移延迟并避免页面变大时整页迁移的性能下降。实验表明,当页面大小为2MB且PCI Express带宽为16GB/s时,部分页迁移可以显著隐藏整页迁移的性能开销,相比于程序员控制数据传输,整页迁移有平均98.62%倍的减速,而部分页迁移可以实现平均1.29倍的加速。此外,我们测试了页面大小对快表缺失率的影响以及迁移单元大小对性能的影响,使设计人员能够基于这些信息做出决策。 Despite the increasing investment in integrated GPUs and next-generation interconnect research,discrete GPUs connected by PCI Express still dominate the market,and the management of data communication between CPUs and GPUs continues to evolve.Initially,the programmers control the data transfer between CPUs and GPUs explicitly.To simplify programming,GPU vendors have developed a programming model to provide a single virtual address space for“CPU+GPU”heterogeneous systems.The page migration engine in this model transfers pages between CPUs and GPUs on demand automatically.To meet the needs of high-performance workloads,the page size tends to be larger.Limited by low bandwidth and high latency interconnections,larger page migration has longer delay,which can reduce the overlap of computation and transmission and cause severe performance degradation.We propose a partial page migration mechanism that only transfers the requested part of a page to shorten the migration latency and avoid performance degradation of the whole page migration when the page becomes larger.Experiments show that the proposed partial page migration can well hide the performance overheads of the whole page migration when the page size is 2MB and the PCI Express bandwidth is 16GB/sec.Compared with data transmission controlled by the programmers,the whole page migration degrades the performance by 98.62 on average,while the partial page migration upgrades the performance by 1.29 on average.Additionally,we examine the impact of page size on TLB miss rate and the impact of migration unit size on execution time,enabling designers to make informed decisions based on this information.

作者张诗情杨耀华沈立王志英 ZHANG Shi-qing;YANG Yao-hua;SHEN Li;WANG Zhi-ying(School of Computer,National University of Defense Technology,Changsha 410073,China)

机构地区国防科技大学计算机学院

出处《计算机工程与科学》 CSCD 北大核心 2019年第7期1168-1175,共8页 Computer Engineering & Science

关键词 “CPU+GPU”异构系统数据通信页迁移 heterogeneous“CPU+GPU”system data communication page migration

分类号 TP392.02 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1樊月琴.有机化学教学中应用迁移规律降低学习难度浅探[J].山西大同大学学报（自然科学版）,1995,17(6):66-68.
2张震,付印金,胡谷雨.相变存储器写寿命延长关键技术研究进展[J].计算机工程与科学,2018,40(9):1546-1555.
3刘翠梅,杨璇,贾刚勇,韩光洁.一种避免页迁移的混合内存页管理策略[J].小型微型计算机系统,2019,40(6):1318-1323. 被引量：1
4董子萱.试论大数据时代计算机软件技术的发展及应用[J].数字通信世界,2019(3):174-174. 被引量：8
5林树青.探析在基于大数据时代下的计算机软件技术应用[J].数码世界,2019(1):98-98. 被引量：2
6陈吉,刘海坤,王孝远,张宇,廖小飞,金海.一种支持大页的层次化DRAM/NVM混合内存系统[J].计算机研究与发展,2018,55(9):2050-2065. 被引量：5
7薛丰昌,唐步兴,黄敏敏.DEM栅格单元大小对汇水区提取的影响研究[J].科技通报,2019,35(3):18-25. 被引量：3
8谢伟云.信息通信网络管理系统中的云技术应用分析[J].信息周刊,2019,0(19):0236-0236.
9吕世奇,高军,曹祥玉,兰俊祥,李思佳,张国雯.一种基于集总电阻加载的小型化超宽带超材料吸波体设计[J].电子与信息学报,2019,41(6):1330-1335. 被引量：10
10郭源,张龙,张舜,余彦平,吕勇志,李纪鹏.结直肠癌组织中MMR蛋白表达缺失的影响因素分析[J].浙江医学,2019,41(13):1374-1376. 被引量：2

计算机工程与科学

2019年第7期

浏览历史

内容加载中请稍等...

通过部分页迁移实现CPU-GPU高效透明的数据通信

相关作者

相关机构

相关主题

浏览历史