期刊文献+

基于数据对象规模的Rank级内存分配方法 被引量:1

Data Object Scale Aware Rank-Level Memory Allocation
下载PDF
导出
摘要 利用主存的多bank/rank/channel结构挖掘访存并行性和局部性,是提高系统性能的重要手段.相关研究工作通过sub-rank技术增加可并行工作的存储资源,或在并行程序之间对bank划分,以隔离访存冲突.但上述方法没有考虑在bank/rank资源共存的情况下,单个程序内部数据对象间的冲突问题.通过观察数据在主存中的分布,发现程序的数据倾向聚簇于单个rank中,并提出了一种基于数据对象规模的rank级内存分配方法(data object scale aware rank-level memory allocation,DSRA).DSRA将冲突开销较大的数据对象分散到不同的rank,利用增长的bank/rank资源提高访存性能.DSRA工作在操作系统层,基于编译器和操作系统提供的信息来分析数据对象间的冲突开销,既不用修改源码,也不依赖特殊的底层硬件.基于2款真实处理器对来自NAS Benchmark和SPEC CPU2000中的存储敏感型基准测试程序进行评测.结果表明,在不影响cache失效率的情况下,DSRA通过减少主存访问周期数,可以降低程序的执行时间.与已有的优化技术相比,性能平均提高6.8%,最高性能提升幅度为16%. The main memory is organized as bank/rank/channel structure, which can be used to improve performance by exploiting parallelism and locality. The previous works have employed sub- ranking techniques to add more bank resource, or guided the bank partition among parallel running processes for isolating the memory interference. However, these methods ignore the interference problem when the memory system involves multiple ranks. In this paper, through an analysis on data layout, we find that program's data is inclined to cluster into a single rank because of the limited working set. This phenomenon results in the underutilized memory resource and system performance. We propose DSRA (data obiect scale aware rank-level memory allocation), which provides a software- only way to deal with this problem. Based on the cost of interference among objects, DSRA puts them into different ranks to avoid cluster. Meanwhile, with the information extracted by compiler and operating system, it requires no modification of application and underlying hardware. Measurement shows that DSRA, implementing in the Linux 2.6.32 kernel and running on two different types of processors, improves the performance of memory intensive NAS benchmark and SPEC CPU2000 by up to 16%(6.8% on average), with little effect on the cache miss rate.
出处 《计算机研究与发展》 EI CSCD 北大核心 2014年第3期672-680,共9页 Journal of Computer Research and Development
基金 "核高基"国家科技重大专项基金项目(2009ZX01029-001-002)
关键词 访存冲突 操作系统 rank聚簇 内存分配 数据对象 memory interference operating system rank cluster memory allocation data object
  • 相关文献

参考文献2

二级参考文献32

  • 1McKee S A, Wulf W A, Aylor J H et al. Dynamic access ordering for streamed computations. IEEE Trans. Computers, 2000, 49(11): 1255-1271.
  • 2Rixner S, Dally W J, Kapasi U J, Mattson P R, Owens J D. Memory access scheduling. In Proc. ISCA 2000, Vancouver, Canada, June 10-14, pp.128-138.
  • 3Scott Rixner. Memory controller optimizations for Web servers. In Proc. MICRO 2004, Portland, USA, Dec. 4-8, pp.355-366.
  • 4Shao J, Davis B T. A burst scheduling access reordering mechanism. In Proc. HPCA 2007, Phoenix, USA, Feb. 10-14, 2007, pp.285-294.
  • 5Zhang Z, Zhu Z, Zhang X. A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality. In Proc. MICRO 2000, Montery, USA, Dec. 10-13, 2000, pp.32-41.
  • 6Lin W F, Reinhardt S K, Burger D. Reducing DRAM latencies with an integrated memory hierarchy design. In Proc. HPCA 2001, Nuevo Leone, Mexico, Jan. 20-24, pp.301-312.
  • 7Shin J, Chame J, Hall M W. A compiler algorithm for exploiting page-mode memory access in embedded-DRAM devices. In Proc. the 4th Workshop on Media and Streaming Processors, Istanbul, Turkey, Nov. 18-19, November 2002.
  • 8Ding C, Kennedy K. Improving effective bandwidth through compiler enhancement of global cache reuse. In Proc. IPDPS 2001, San Francisco, USA, April 23-27, 2001, p.38.
  • 9Jacob B, Ng S W, Wang D T. With Contributions by Samuel Rodriguez, Memory Systems: Cache, DRAM, Disk. ISBN 978-0-12-379751-3, Morgan Kaufmann Publishers, September 2007.
  • 10Mutlu O, Moscibroda T. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In Proc. ISCA 2008, Beijing, China, June 21-25, 2008, pp.63-74.

共引文献1

同被引文献9

引证文献1

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部