期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Compressed page walk cache
1
作者 Dunbo ZHANG Chaoyang JIA Li SHEN 《Frontiers of Computer Science》 SCIE EI CSCD 2022年第3期41-52,共12页
GPUs are widely used in modem high-performance computing systems.To reduce the burden of GPU programmers,operating system and GPU hardware provide great supports for shared virtual memory,which enables GPU and CPU to ... GPUs are widely used in modem high-performance computing systems.To reduce the burden of GPU programmers,operating system and GPU hardware provide great supports for shared virtual memory,which enables GPU and CPU to share the same virtual address space.Unfortunately,the current SIMT execution model of GPU brings great challenges for the virtual-physical address translation on the GPU side,mainly due to the huge number of virtual addresses which are generated simultaneously and the bad locality of these virtual addresses.Thus,the excessive TLB accesses increase the miss ratio of TLB.As an attractive solution,Page Walk Cache(PWC)has received wide attention for its capability of reducing the memory accesses caused by TLB misses.However,the current PWC mechanism suffers from heavy redundancies,which significantly limits its efficiency.In this paper,we first investigate the facts leading to this issue by evaluating the performance of PWC with typical GPU benchmarks.We find that the repeated L4 and L3 indices of virtual addresses increase the redundancies in PWC,and the low locality of L2 indices causes the low hit ratio in PWC.Based on these observations,we propose a new PWC structure,namely Compressed Page Walk Cache(CPWC),to resolve the redundancy burden in current PWC.Our CPWC can be organized in either direct-mapped mode or set-associated mode.Experimental results show that CPWC increases by 3 times over TPC in the number of page table entries,increases by 38.3%over PWC in L2 index hit ratio and reduces by 26.9%in the memory accesses of page tables.The average memory accesses caused by each TLB miss is reduced to 1.13.Overall,the average IPC can improve by 25.3%. 展开更多
关键词 GPU shared virtual memory address translation PWC
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部