期刊文献+

Reinventing Memory System Design for Many-Accelerator Architecture

Reinventing Memory System Design for Many-Accelerator Architecture
原文传递
导出
摘要 The many-accelerator architecture, mostly composed of general-purpose cores and accelerator-like function units (FUs), becomes a great alternative to homogeneous chip multiprocessors (CMPs) for its superior power-efficiency. However, the emerging many-accelerator processor shows a much more complicated memory accessing pattern than general purpose processors (GPPs) because the abundant on-chip FUs tend to generate highly-concurrent memory streams with distinct locality and bandwidth demand. The disordered memory streams issued by diverse accelerators exhibit a mutual- interference behavior and cannot be efficiently handled by the orthodox main memory interface that provides an inflexible data fetching mode. Unlike the traditional DRAM memory, our proposed Aggregation Memory System (AMS) can function adaptively to the characterized memory streams from different FUs, because it provides the FUs with different data fetching sizes and protects their locality in memory access by intelligently interleaving their data to memory devices through sub-rank binding. Moreover, AMS can batch the requests without sub-rank conflict into a read burst with our optimized memory scheduling policy. Experimental results from trace-based simulation show both conspicuous performance boost and energy saving brought by AMS. The many-accelerator architecture, mostly composed of general-purpose cores and accelerator-like function units (FUs), becomes a great alternative to homogeneous chip multiprocessors (CMPs) for its superior power-efficiency. However, the emerging many-accelerator processor shows a much more complicated memory accessing pattern than general purpose processors (GPPs) because the abundant on-chip FUs tend to generate highly-concurrent memory streams with distinct locality and bandwidth demand. The disordered memory streams issued by diverse accelerators exhibit a mutual- interference behavior and cannot be efficiently handled by the orthodox main memory interface that provides an inflexible data fetching mode. Unlike the traditional DRAM memory, our proposed Aggregation Memory System (AMS) can function adaptively to the characterized memory streams from different FUs, because it provides the FUs with different data fetching sizes and protects their locality in memory access by intelligently interleaving their data to memory devices through sub-rank binding. Moreover, AMS can batch the requests without sub-rank conflict into a read burst with our optimized memory scheduling policy. Experimental results from trace-based simulation show both conspicuous performance boost and energy saving brought by AMS.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2014年第2期273-280,共8页 计算机科学技术学报(英文版)
基金 Supported by the National Natural Science Foundation of China under Grant Nos.61173006,60921002 the National BasicResearch 973 Program of China under Grant No.2011CB302503 the Strategic Priority Research Program of the Chinese Academyof Sciences under Grant No.XDA06010403
关键词 many-accelerator chip multiprocessor MEMORY general purpose processor many-accelerator, chip multiprocessor, memory, general purpose processor
  • 相关文献

参考文献24

  • 1Yan G, Li Y, Han Y, Li X, Guo M, Liang X. AgileRegulator: A hybird voltage regulator scheme redeeming dark silicon for power efficiency in a multicore architecture. In Proc. the 18th International Symposium on High Performance Com- puter Architecture, Feb. 2012, pp.287-298.
  • 2Fu B, Han Y, Ma J, Li H, Li X. An abacus turn model for time/space-efficient reconfigurable routing. In Proc. the 38th International Symposium on Computer Arehitecture, June 2011, pp.259-270.
  • 3Hameed R, Qadeer W, Wachs M, Azizi O, Solomatnikov A, Lee B C, Richardson S, Kozyrakis C, Horowitz M. Under- standing sources of inefficiency in general-purpose chips. In Proc. the 37th Annual International Symposium on Com- puter Architecture, June 2010, pp:37-47.
  • 4Cong J, Grigorian B, Reinman G, Vitanza M. Accelerating vision and navigation applications on a customizable plat- form. In Proe. the 2Pnd IEEE International Conference on Application-Specific Systems, Architectures and Processors, Sept. 2011, pp.25-32.
  • 5Auras D, Girbal S, Berry Het al. CMA: Chip multi- accelerator. In Proc. the 8th IEEE Symposium on Appli- cation Specific Processors, June 2010, pp.8-15.
  • 6Girbal S, Temam O, Yehia S, Berry H, Li Z. A memory inter- face for multi-purpose multi-stream accelerators. In Proc. the 13rd International Conference on Compilers, Architectures and Synthesis for Embedded Systems, October 2010, pp.107- 116.
  • 7Chien A A, Snavely A, Gahagan M. 10~10: A general- purpose architectural approach to heterogeneity and energy efficiency. In Proc. the 11th International Conference on Computational Science, June 2011, pp.1987-1996.
  • 8Yoon D H, Jeong M K, Erez M. Adaptive granularity memory systems: A tradeoff between storage efficiency and through- put. In Proc. the 38th Annual International Symposium on Computer Architecture, June 2011, pp.295-306.
  • 9Rosenfeld P, Cooper-Balis E, Jacob B. DRAMSim2: A cycle accurate memory system simulator. Computer Architecture Letters, 2011, 10(1): 16-19.
  • 10Seznec A. Decoupled sectored caches: Conciliating low tag implementation cost. In Proc. the 21st Annual International Symposium on Computer Architecture, Apr. 1994, pp.384- 393.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部