Reinventing Memory System Design for Many-Accelerator Architecture

Reinventing Memory System Design for Many-Accelerator Architecture

导出

摘要 The many-accelerator architecture, mostly composed of general-purpose cores and accelerator-like function units （FUs）, becomes a great alternative to homogeneous chip multiprocessors （CMPs） for its superior power-efficiency. However, the emerging many-accelerator processor shows a much more complicated memory accessing pattern than general purpose processors （GPPs） because the abundant on-chip FUs tend to generate highly-concurrent memory streams with distinct locality and bandwidth demand. The disordered memory streams issued by diverse accelerators exhibit a mutual- interference behavior and cannot be efficiently handled by the orthodox main memory interface that provides an inflexible data fetching mode. Unlike the traditional DRAM memory, our proposed Aggregation Memory System （AMS） can function adaptively to the characterized memory streams from different FUs, because it provides the FUs with different data fetching sizes and protects their locality in memory access by intelligently interleaving their data to memory devices through sub-rank binding. Moreover, AMS can batch the requests without sub-rank conflict into a read burst with our optimized memory scheduling policy. Experimental results from trace-based simulation show both conspicuous performance boost and energy saving brought by AMS. The many-accelerator architecture, mostly composed of general-purpose cores and accelerator-like function units （FUs）, becomes a great alternative to homogeneous chip multiprocessors （CMPs） for its superior power-efficiency. However, the emerging many-accelerator processor shows a much more complicated memory accessing pattern than general purpose processors （GPPs） because the abundant on-chip FUs tend to generate highly-concurrent memory streams with distinct locality and bandwidth demand. The disordered memory streams issued by diverse accelerators exhibit a mutual- interference behavior and cannot be efficiently handled by the orthodox main memory interface that provides an inflexible data fetching mode. Unlike the traditional DRAM memory, our proposed Aggregation Memory System （AMS） can function adaptively to the characterized memory streams from different FUs, because it provides the FUs with different data fetching sizes and protects their locality in memory access by intelligently interleaving their data to memory devices through sub-rank binding. Moreover, AMS can batch the requests without sub-rank conflict into a read burst with our optimized memory scheduling policy. Experimental results from trace-based simulation show both conspicuous performance boost and energy saving brought by AMS.

作者王颖张磊韩银和李华伟

机构地区 State Key Laboratory of Computer Architecture University of Chinese Academy of Sciences

出处《Journal of Computer Science & Technology》 SCIE EI CSCD 2014年第2期273-280,共8页 计算机科学技术学报（英文版）

基金 Supported by the National Natural Science Foundation of China under Grant Nos.61173006,60921002 the National BasicResearch 973 Program of China under Grant No.2011CB302503 the Strategic Priority Research Program of the Chinese Academyof Sciences under Grant No.XDA06010403

关键词 many-accelerator chip multiprocessor MEMORY general purpose processor many-accelerator, chip multiprocessor, memory, general purpose processor

分类号 TP333.1 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献24

1Yan G, Li Y, Han Y, Li X, Guo M, Liang X. AgileRegulator: A hybird voltage regulator scheme redeeming dark silicon for power efficiency in a multicore architecture. In Proc. the 18th International Symposium on High Performance Com- puter Architecture, Feb. 2012, pp.287-298.
2Fu B, Han Y, Ma J, Li H, Li X. An abacus turn model for time/space-efficient reconfigurable routing. In Proc. the 38th International Symposium on Computer Arehitecture, June 2011, pp.259-270.
3Hameed R, Qadeer W, Wachs M, Azizi O, Solomatnikov A, Lee B C, Richardson S, Kozyrakis C, Horowitz M. Under- standing sources of inefficiency in general-purpose chips. In Proc. the 37th Annual International Symposium on Com- puter Architecture, June 2010, pp:37-47.
4Cong J, Grigorian B, Reinman G, Vitanza M. Accelerating vision and navigation applications on a customizable plat- form. In Proe. the 2Pnd IEEE International Conference on Application-Specific Systems, Architectures and Processors, Sept. 2011, pp.25-32.
5Auras D, Girbal S, Berry Het al. CMA: Chip multi- accelerator. In Proc. the 8th IEEE Symposium on Appli- cation Specific Processors, June 2010, pp.8-15.
6Girbal S, Temam O, Yehia S, Berry H, Li Z. A memory inter- face for multi-purpose multi-stream accelerators. In Proc. the 13rd International Conference on Compilers, Architectures and Synthesis for Embedded Systems, October 2010, pp.107- 116.
7Chien A A, Snavely A, Gahagan M. 10~10: A general- purpose architectural approach to heterogeneity and energy efficiency. In Proc. the 11th International Conference on Computational Science, June 2011, pp.1987-1996.
8Yoon D H, Jeong M K, Erez M. Adaptive granularity memory systems: A tradeoff between storage efficiency and through- put. In Proc. the 38th Annual International Symposium on Computer Architecture, June 2011, pp.295-306.
9Rosenfeld P, Cooper-Balis E, Jacob B. DRAMSim2: A cycle accurate memory system simulator. Computer Architecture Letters, 2011, 10(1): 16-19.
10Seznec A. Decoupled sectored caches: Conciliating low tag implementation cost. In Proc. the 21st Annual International Symposium on Computer Architecture, Apr. 1994, pp.384- 393.

1Instructions to authors[J].World Journal of Gastroenterology,2011,17(5):683-688.
2GENERAL INFORMATION[J].World Journal of Gastroenterology,2011,17(39).
3周可,张江陵,冯丹,万志坤.基于存取模式的Cache预取自适应策略研究[J].计算机工程与科学,2003,25(1):80-84. 被引量：1
4陈铭,陈俊.基于单片机AT89C51的数据采集系统设计[J].中国水运（下半月）,2008,8(10):132-133.
5World Journal of Gastroenterology—GENRAL INFORMATION[J].World Journal of Gastroenterology,2011,17(26).
6GENERAL INFORMATION[J].World Journal of Gastroenterology,2012,18(17).
7Instructions to authors[J].World Journal of Gastroenterology,2011,17(11):1521-1526.
8GENERAL INFORMATION[J].World Journal of Gastroenterology,2012,18(25).
9Instructions to authors[J].World Journal of Gastroenterology,2011,17(1):139-144.
10王咏武.关于XENIX安全性的几点思考[J].中国金融电脑,1996(5):37-38.

Journal of Computer Science & Technology

2014年第2期

浏览历史

内容加载中请稍等...

Reinventing Memory System Design for Many-Accelerator Architecture

参考文献24

相关作者

相关机构

相关主题

浏览历史