期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
面向多媒体SoC的存储体访存负载均衡划分方法 被引量:1
1
作者 钟祺 王晶 王克义 《计算机辅助设计与图形学学报》 EI CSCD 北大核心 2015年第3期514-522,共9页
随着多媒体So C中具备密集访存能力的设备数量增加,设备之间频繁争抢存储体资源,严重影响访存性能.为此提出一种面向多媒体So C的存储体访存负载均衡划分方法.通过操作系统对物理内存的管理,将设备所访问的数据映射到独立的存储体中,避... 随着多媒体So C中具备密集访存能力的设备数量增加,设备之间频繁争抢存储体资源,严重影响访存性能.为此提出一种面向多媒体So C的存储体访存负载均衡划分方法.通过操作系统对物理内存的管理,将设备所访问的数据映射到独立的存储体中,避免争抢频繁的设备共享存储体,减少设备间的访存冲突;划分过程基于数据量、延迟分析设备访存行为与访存冲突之间的关系,并以此来均衡各存储体的访问负载,同时提升多个设备的访存性能.该方法不依赖特殊硬件也无需修改上层应用,提供了一种透明的纯软件优化手段.将文中方法应用于真实的多媒体So C的实验结果表明,与基于带宽优先的划分方法相比,该方法在提高带宽利用率的同时降低访存延迟,将解码帧率提升8.4%~12.3%;并且在保证服务质量的情况下,可以通过进一步降低内存工作频率来减少系统功耗. 展开更多
关键词 内存分配 片上计算机系统 存储体划分 访存冲突
下载PDF
CWLP:coordinated warp scheduling and locality-protected cache allocation on GPUs 被引量:1
2
作者 Yang ZHANG Zuo-cheng XING +1 位作者 Cang LIU Chuan TANG 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2018年第2期206-220,共15页
As we approach the exascale era in supercomputing, designing a balanced computer system with a powerful computing ability and low power requirements has becoming increasingly important. The graphics processing unit(... As we approach the exascale era in supercomputing, designing a balanced computer system with a powerful computing ability and low power requirements has becoming increasingly important. The graphics processing unit(GPU) is an accelerator used widely in most of recent supercomputers. It adopts a large number of threads to hide a long latency with a high energy efficiency. In contrast to their powerful computing ability, GPUs have only a few megabytes of fast on-chip memory storage per streaming multiprocessor(SM). The GPU cache is inefficient due to a mismatch between the throughput-oriented execution model and cache hierarchy design. At the same time, current GPUs fail to handle burst-mode long-access latency due to GPU's poor warp scheduling method.Thus, benefits of GPU's high computing ability are reduced dramatically by the poor cache management and warp scheduling methods, which limit the system performance and energy efficiency. In this paper, we put forward a coordinated warp scheduling and locality-protected(CWLP) cache allocation scheme to make full use of data locality and hide latency. We first present a locality-protected cache allocation method based on the instruction program counter(LPC) to promote cache performance. Specifically, we use a PC-based locality detector to collect the reuse information of each cache line and employ a prioritised cache allocation unit(PCAU) which coordinates the data reuse information with the time-stamp information to evict the lines with the least reuse possibility. Moreover, the locality information is used by the warp scheduler to create an intelligent warp reordering scheme to capture locality and hide latency. Simulation results show that CWLP provides a speedup up to 19.8% and an average improvement of 8.8% over the baseline methods. 展开更多
关键词 LOCALITY Graphics processing unit (GPU) Cache allocation Warp scheduling
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部