期刊文献+

Simultaneous Accelerator Parallelization and Point-to-Point Interconnect Insertion for Bus-Based Embedded SoCs

Simultaneous Accelerator Parallelization and Point-to-Point Interconnect Insertion for Bus-Based Embedded SoCs
原文传递
导出
摘要 As performance requirements for bus-based embedded System-on-Chips(So Cs) increase, more and more on-chip application-specific hardware accelerators(e.g., filters, FFTs, JPEG encoders, GSMs, and AES encoders) are being integrated into their designs. These accelerators require system-level tradeoffs among performance, area, and scalability. Accelerator parallelization and Point-to-Point(P2P) interconnect insertion are two effective system-level adjustments. The former helps to boost the computing performance at the cost of area,while the latter provides higher bandwidth at the cost of routability. What’s more, they interact with each other. This paper proposes a design flow to optimize accelerator parallelization and P2 P interconnect insertion simultaneously.To explore the huge optimization space, we develop an effective algorithm, whose goal is to reduce total So C latency under the constraints of So C area and total P2 P wire length. Experimental results show that the performance difference between our proposed algorithm and the optimal results is only 2.33% on average, while the running time of the algorithm is less than 17 s. As performance requirements for bus-based embedded System-on-Chips(So Cs) increase, more and more on-chip application-specific hardware accelerators(e.g., filters, FFTs, JPEG encoders, GSMs, and AES encoders) are being integrated into their designs. These accelerators require system-level tradeoffs among performance, area, and scalability. Accelerator parallelization and Point-to-Point(P2P) interconnect insertion are two effective system-level adjustments. The former helps to boost the computing performance at the cost of area,while the latter provides higher bandwidth at the cost of routability. What’s more, they interact with each other. This paper proposes a design flow to optimize accelerator parallelization and P2 P interconnect insertion simultaneously.To explore the huge optimization space, we develop an effective algorithm, whose goal is to reduce total So C latency under the constraints of So C area and total P2 P wire length. Experimental results show that the performance difference between our proposed algorithm and the optimal results is only 2.33% on average, while the running time of the algorithm is less than 17 s.
出处 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2015年第6期644-660,共17页 清华大学学报(自然科学版(英文版)
基金 supported in part by the National Natural Science Foundation of China (No. 61271269) the National High-Tech Research and Development (863) Program (No. 2013AA01320) the Importation and Development of High-Caliber Talents Project of Beijing Municipal Institutions (No. YETP0102)
关键词 accelerator parallelization point-to-point interco accelerator parallelization point-to-point interco
  • 相关文献

参考文献11

  • 1Hyung Gyu Lee,Naehyuck Chang,Umit Y. Ogras,Radu Marculescu.On-chip communication architecture exploration[J]. ACM Transactions on Design Automation of Electronic Systems (TODAES) . 2008 (3)
  • 2Sridhara, S.R.,DiRenzo, M.,Lingam, S.,Seok-Jun Lee,Blazquez, R.,Maxey, J.,Ghanem, S.,Yu-Hung Lee,Abdallah, R.,Singh, P.,Goel, M.Microwatt Embedded Processor Platform for Medical System-on-Chip Applications. Solid-State Circuits, IEEE Journal of . 2011
  • 3Joyce Kwong,Anantha P. Chandrakasan.An Energy-Efficient Biomedical Signal Processing Platform. IEEE Journal of Solid State Circuits . 2011
  • 4Fan Zhang,Yanqing Zhang,Jason Silver."A Batterless 19uW MICS/ISM-Band Energy Harvesting Body Area Sensor Node SoC". IEEE International Solid-State Circuits Conference . 2012
  • 5Lee, H.G,Ogras, U.Y,Marculescu, R.Design space exploration and prototyping for on-chipmultimedia applications. Design Automation Conference,43rd ACM/IEEE . 2006
  • 6Davide Bertozzi,Antoine Jalabert,Srinivasan Murali,Rutuparna Tamhankar,Stergios Stergiou,Luca Benini,Giovanni De Micheli.NoC Synthesis Flow for Customized Domain Specific Multiprocessor Systems-on-Chip. IEEE Transactions on Parallel and Distributed Systems . 2005
  • 7Hempstead, Mark,Brooks, David,Wei, Gu-Yeon.An accelerator-based wireless sensor network processor in 130 nm CMOS. IEEE Journal on Emerging and Selected Topics in Circuits and Systems . 2011
  • 8C J Alpert,S S Sapatnekar,D Keller,G E Tellez,L Reddy.GLARE: Global and Local Wiring Aware Routability Evaluation. Design Automation Conference . 2012
  • 9Ahmed S,Wang Z,Klaiber M,Wahl S,Wroblewski M,Simon S.Parallel hardware architecture for JPEG-LS based on domain decomposition. Applications of Digital Image Processing Xxxv . 2012
  • 10S. Pasricha,N. Dutt,M. Ben-Romdhane."Constraint-driven bus matrix synthesis for MPSoC,". Asia and South Pacific Design Automation Conference ASPDAC 2006 . 2006

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部