GPGPU和CUDA统一内存研究现状综述

Survey on GPGPU and CUDA Unified Memory Research Status

下载PDF

导出

摘要在大数据背景下,随着科学计算、人工智能等领域的快速发展,各领域对硬件的算力要求越来越高。图形处理器(GPU)特殊的硬件架构,使其适合进行高并行度的计算,并且近年来GPU与人工智能、科学计算等领域互相发展促进,使GPU功能细化,逐渐发展出了成熟的通用图形处理器(GPGPU),目前GPGPU已成为中央处理器(CPU)最重要的协处理器之一。然而,GPU硬件配置在出厂后不容易更改且显存容量有限,在处理大数据集时显存容量不足的缺点对计算性能造成较大的影响。统一计算设备架构(CUDA)6.0推出了统一内存,使GPGPU和CPU可以共享虚拟内存空间,以此来简化异构编程和扩展GPGPU可访问的内存空间。统一内存为GPGPU处理大数据集提供了一项可行的解决方案,在一定程度上缓解了GPU显存容量较小的问题,但是统一内存的使用也带来了一些性能问题,如何在统一内存中做好内存管理成为性能提升的关键。本研究对CUDA统一内存的发展和应用进行综述,包括CUDA统一内存的特性、发展、优势和局限性以及在人工智能、大数据处理系统等领域的应用和未来的发展前景,为未来使用和优化CUDA统一内存的研究工作提供有价值的参考。 In the context of big data,the rapid advancement of fields such as scientific computing and artificial intelligence,there is an increasing demand for high computational power across various domains.The unique hardware architecture of the Graphics Processing Unit(GPU)makes it suitable for parallel computing.In recent years,the concurrent development of GPUs and fields such as artificial intelligence and scientific computing has enhanced GPU capabilities,leading to the emergence of mature General-Purpose Graphics Processing Units(GPGPUs).Currently,GPGPUs are one of the most important co-processors for Central Processing Units(CPUs).However,the fixed hardware configuration of the GPU after delivery and its limited memory capacity can significantly hinder its performance,particularly when dealing with large datasets.To address this issue,Compute Unified Device Architecture(CUDA)6.0 introduces unified memory,allowing GPGPU and CPU to share a virtual memory space,thereby simplifying heterogeneous programming and expanding the GPGPU-accessible memory space.Unified memory offers a solution for processing large datasets on GPGPUs and alleviates the constraints of limited GPGPU memory capacity.However,the use of unified memory introduces performance issues.Effective data management within unified memory is the key to enhancing performance.This article provides an overview of the development and application of CUDA unified memory.It covers topics such as the features and evolution of unified memory,its advantages and limitations,its applications in artificial intelligence and big data processing systems,and its prospects.This article provides a valuable reference for future work on applying and optimizing CUDA unified memory.

作者庞文豪王嘉伦翁楚良 PANG Wenhao;WANG Jialun;WENG Chuliang(School of Data Science and Engineering,East China Normal University,Shanghai 200062,China;Research Institute of Interdisciplinary Innovation,Zhejiang Laboratory,Hangzhou 310000,Zhejiang,China)

机构地区华东师范大学数据科学与工程学院之江实验室交叉创新研究院

出处《计算机工程》 CAS CSCD 北大核心 2024年第12期1-15,共15页 Computer Engineering

基金国家自然科学基金(62272171) 浙江省“尖兵”“领雁”研发攻关计划(2022C04006)。

关键词通用图形处理器统一内存显存超额订阅数据管理异构系统 General-Purpose Graphics Processing Unit(GPGPU) unified memory memory oversubscription data management heterogeneous system

分类号 TP316 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献3

1王鹤澎,王宏志,李佳宁,孔欣欣,李建中,高宏.面向新型处理器的数据密集型计算[J].软件学报,2016,27(8):2048-2067. 被引量：4
2裴威,李战怀,潘巍.GPU数据库核心技术综述[J].软件学报,2021,32(3):859-885. 被引量：10
3Jialun WANG,Wenhao PANG,Chuliang WENG,Aoying ZHOU.D-Cubicle:boosting data transfer dynamically for large-scale analytical queries in single-GPU systems[J].Frontiers of Computer Science,2023,17(4):141-153. 被引量：1

二级参考文献108

1Xi S, Babarinsa O, Athanassoulis M, Idreos S. Beyond the wall: Near-Data processing for databases. In: Proc. of the Int'l Workshop on Data Management on New Hardware. 2015. [doi: 10.1145/2771937.2771945 ].
2Aingaran K, Smcntek D, Wicki T, Jairath S, Konstadinidis G, Leung S, Loewenstein P, McAllister C, Phillips S, Radovic Z, Sivaramakfishnan R. M7: Oracle's next-generation spare processor. IEEE Micro, 2015,2:36-45. [doi: 10.1109/MM.2015.35].
3Choi SH, Park N, Song YH, Lee SW. ASiPEC: An application specific instruction-set processor for high performance entropy coding. In: Proc. of the Ubiquitous Computing Application and Wireless Sensor. Springer-Verlag, 2015.67-75. [doi: 10.1007/978- 94-017-9618-7_7].
4Francisco P. The Netezza data appliance architecture: A platform for high performance data warehousing and analytics. IBM Redbooks, 2011.
5Becher A, Bauer F, Ziener D, Teich J. Energy-Aware SQL query acceleration through FPGA-based dynamic partial reconfiguration. In: Proc. of 2014 the 24th Int'l Conf. on Field Programmable Logic and Applications (FPL). IEEE, 2014. 1-8. [doi: 10.1109/FPL. 2014.6927502].
6Mueller R, Teubner J, Alonso G. Glacier: A query-to-hardware compiler. In: Proc. of the 2010 ACM SIGMOD Int'l Conf. on Management of Data. ACM Press, 2010.1159-1162. [doi: 10.1145/1807167.1807307].
7Dennl C, Ziener D, Teich J. On-the-Fly composition of FPGA-based SQL query accelerators using a partially reconfigurable module library. In: Proe. of the Annual IEEE Symp. on Field-Programmable Custom Computing Machines. IEEE, 2012. 45-52. [doi: 10.1109/FCCM.2012.18].
8Woods L, Istvlin Z, Alonso G. Ibex: An intelligent storage engine with support for advanced SQL offloading. Proc. of the VLDB Endowment, 2014,7(11):963-974. [doi: 10.14778/2732967.2732972].
9Scofield TC, Delmerico JA, Chaudhary V, Valente G. Xtremedata dbx: An FPGA-based data warehouse appliance. Computing in Science & Engineering, 2010,12(4):66-73. [doi: 10.1109/MCSE.2010.93].
10Sukhwani B, Min H, Thoennes M, Dube P, Iyer B, Brezzo B, Dillenberger D, Asaad S. Database analytics acceleration using FPGAs. In: Proc. of the 21st Int'l Conf. on Parallel Architectures and Compilation Techniques. ACM Press, 2012.411-420. [doi: 10.1145/2370816.2370874].

共引文献12

1覃伟荣.云计算中基于遗传算法的数据布局策略[J].计算机与数字工程,2020,48(3):534-539. 被引量：2
2孙建国,贺子天,李思照.基于异构视觉计算模块的动态可重构系统[J].无线电工程,2021,51(3):173-178. 被引量：1
3余先昊,周凤.利用启发式数据分发策略求解全比较问题[J].计算机工程与设计,2022,43(3):751-756.
4李晓东.基于移动物联网技术的青贮饲料可追溯系统设计[J].饲料研究,2022,45(8):123-126. 被引量：7
5梅宏,杜小勇,金海,程学旗,柴云鹏,石宣化,靳小龙,王亚沙,刘驰.大数据技术前瞻[J].大数据,2023,9(1):1-20. 被引量：30
6张超,李国良,冯建华,张金涛.HTAP数据库关键技术综述[J].软件学报,2023,34(2):761-785. 被引量：6
7冷芳玲,刘军,吴莹莹,鲍玉斌.GPU上的查询算子的设计与优化[J].计算机工程与应用,2023,59(8):81-88.
8虞文波,游进国,牛祥虞.基于强化学习的数据库多属性索引推荐[J].计算机应用研究,2023,40(6):1789-1793. 被引量：1
9陈现森,徐辰.基于openGauss的异构算子加速技术[J].华东师范大学学报（自然科学版）,2023(5):90-99.
10张延松,刘专,韩瑞琛,张宇,王珊.GPU数据库OLAP优化技术研究[J].软件学报,2023,34(11):5205-5229.

1朱超.Numba下自适应双阈值的Canny边缘检测并行算法[J].电脑知识与技术,2024,20(31):34-39.
2许金海.5G终端基带处理的CUDA加速方法[J].中国宽带,2024,20(2):4-6.
3刘鹏娟.融合CUDA与OPI制导调优的语言在线翻译器自动化并行优化研究[J].自动化与仪器仪表,2024(9):230-233.
4颜祥磊,韩辉.基于力矩反馈法的陀螺转子动平衡测试系统[J].一重技术,2024(5):38-41.
5Hassan Bagheri,Reza Mohebian,Ali Moradzadeh,Behnia Azizzadeh Mehmandost Olya.Pore size classification and prediction based on distribution of reservoir fluid volumes utilizing well logs and deep learning algorithm in a complex lithology[J].Artificial Intelligence in Geosciences,2024,5(1):336-358.
6傅游,杜雷明,高希然,陈莉.新一代神威处理器上高效任务流并行系统[J].计算机科学,2024,51(12):137-146.
7王海麟,冯献礼,辜方林,高明柯,赵海涛.OSIC检测中高效排序QR分解FPGA实现[J].数据采集与处理,2024,39(6):1420-1431.

计算机工程

2024年第12期

浏览历史

内容加载中请稍等...

GPGPU和CUDA统一内存研究现状综述

参考文献3

二级参考文献108

共引文献12

相关作者

相关机构

相关主题

浏览历史