期刊文献+

异构内存系统全局优化的数据预取算法 被引量:1

Data Prefetching Algorithm for Globally Optimizing Heterogeneous Memory System
下载PDF
导出
摘要 鉴于现有的数据预取算法不能满足高效能异构计算系统对动态随机存取存储器(DRAM)和非易失性存储器(NVM)相结合的新型异构存储器高效访问的要求,提出了一种模拟退火的全局优化数据预取算法(SADPA)。该算法在启发式搜索模拟退火算法的基础上,引入了随机因子,以避免局部最优,从而确定了全局优化阈值以预取NVM页面的有效数量。实验结果表明,该算法相对于静态阈值调整算法,平均访问延时降低了4%,每个时钟周期内的平均指令数(IPC)增加了10.1%;对于cactusADM应用,该算法相对于软硬件协同的动态阈值调整算法,系统能耗降低了3.4%。 Due to the existing data prefetching algorithms can ’t meet the requirements of the novel heterogeneous memory system combining the dynamic random access memory (DRAM) with the nonvolatile memory (NVM) in high energy-efficiency heterogeneous computing systems,a simulated annealing data prefetching algorithm (SADPA) was proposed.It was a heuristic search inspired simulated annealing algorithm,in which a random factor was introduced to confirm the global optimal threshold and the valid number of prefetching NVM pages.The results show that the average accessing latency of SADPA is 4% lower than that of the static threshold adjustment algorithm,and the average instruction per cycle (IPC) of the SADPA is 10.1% greater than that of the static threshold adjustment algorithm.Besides,the systemic power supported by SADPA,as for the cactusADM,is reduced by 3.4% compared with the cooperative hardware/software dynamic threshold adjustment algorithm.
作者 裴颂文 赵梦旖 姬燕飞 PEI Songwen;ZHAO Mengyi;JI Yanfei(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai200093,China;School of Management,Fudan University,Shanghai 200433,China)
出处 《上海理工大学学报》 CAS CSCD 北大核心 2019年第1期22-29,共8页 Journal of University of Shanghai For Science and Technology
基金 中国博士后科学基金资助项目(2017M610230) 国家自然科学基金资助项目(61775139 61332009) 上海市自然科学基金资助项目(15ZR1428600) 上海市浦江人才计划项目(PJ1407600)
关键词 异构内存系统 数据预取 模拟退火算法 全局优化 heterogeneous memory system data prefetching simulated annealing algorithm global optimum
  • 相关文献

参考文献3

二级参考文献189

  • 1Chen W Y W,博士学位论文,1993年
  • 2Chen Tienfu,Proceedings of the 5th International Conference on Architectural Support for Pro,1992年,51页
  • 3Borkar S. Thousand core chips : a technology perspective[ C ]// Proceedings of the 44th Annual Design Automation Conference (DAC) , San Diego, California, 2007:746-749.
  • 4Chung E S, Milder P A, Hoe J C, et al. Single-chip heterogeneous computing: does the future include custom logic, FPGAs, and GPGPUs [ C l//Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture ( MICRO), Adanta, GA, 2010 : 225 - 236.
  • 5Brookwood N. AMD fusion family of APUs: enabling a superior, immersive PC experience [ EB/OL]. [ 2014 - 06 - 10]. http://www, amd. com.
  • 6Intel haswell microarchitecture [ EB/OL ]. Intel Corpaoration. [2014 -06 - 10]. http://www, intel, com.
  • 7Nvidia project denver[ EB/OL]. Nvidia Corporation. [ 2014 - 06 -101. http://www, nvidia, com.
  • 8Big. LITTLE processing [ EB/OL ]. ARM Corporation [ 2014 - 06 - 10]. http://www, arm. com.
  • 9Lustig D, Martonosi M. Reducing GPU offload latency via fine- grained CPU-GPU synchronization [ C ]//Proceedings of the IEEE 19th International Symposium on High-Performance Computer Architecture ( HPCA), Shenzhen, China, 2013 : 354 - 365.
  • 10Daga M, Aji A M, Feng W. On the efficacy of a fused CPU + GPU processor ( or APU ) for parallel computing [ C ]// Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing, Knoxville Tennessee, 2011 : 141 - 149.

共引文献39

同被引文献10

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部