Simultaneous Accelerator Parallelization and Point-to-Point Interconnect Insertion for Bus-Based Embedded SoCs

Simultaneous Accelerator Parallelization and Point-to-Point Interconnect Insertion for Bus-Based Embedded SoCs

导出

摘要 As performance requirements for bus-based embedded System-on-Chips(So Cs) increase, more and more on-chip application-specific hardware accelerators(e.g., filters, FFTs, JPEG encoders, GSMs, and AES encoders) are being integrated into their designs. These accelerators require system-level tradeoffs among performance, area, and scalability. Accelerator parallelization and Point-to-Point(P2P) interconnect insertion are two effective system-level adjustments. The former helps to boost the computing performance at the cost of area,while the latter provides higher bandwidth at the cost of routability. What’s more, they interact with each other. This paper proposes a design flow to optimize accelerator parallelization and P2 P interconnect insertion simultaneously.To explore the huge optimization space, we develop an effective algorithm, whose goal is to reduce total So C latency under the constraints of So C area and total P2 P wire length. Experimental results show that the performance difference between our proposed algorithm and the optimal results is only 2.33% on average, while the running time of the algorithm is less than 17 s. As performance requirements for bus-based embedded System-on-Chips(So Cs) increase, more and more on-chip application-specific hardware accelerators(e.g., filters, FFTs, JPEG encoders, GSMs, and AES encoders) are being integrated into their designs. These accelerators require system-level tradeoffs among performance, area, and scalability. Accelerator parallelization and Point-to-Point(P2P) interconnect insertion are two effective system-level adjustments. The former helps to boost the computing performance at the cost of area,while the latter provides higher bandwidth at the cost of routability. What’s more, they interact with each other. This paper proposes a design flow to optimize accelerator parallelization and P2 P interconnect insertion simultaneously.To explore the huge optimization space, we develop an effective algorithm, whose goal is to reduce total So C latency under the constraints of So C area and total P2 P wire length. Experimental results show that the performance difference between our proposed algorithm and the optimal results is only 2.33% on average, while the running time of the algorithm is less than 17 s.

作者 Daming Zhang Yongpan Liu Shuangchen Li Tongda Wu Huazhong Yang

机构地区 Department of Electronic Engineering Department of Electronic and Computer Engineering

出处《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2015年第6期644-660,共17页 清华大学学报（自然科学版（英文版）

基金 supported in part by the National Natural Science Foundation of China (No. 61271269) the National High-Tech Research and Development (863) Program (No. 2013AA01320) the Importation and Development of High-Caliber Talents Project of Beijing Municipal Institutions (No. YETP0102)

关键词 accelerator parallelization point-to-point interco accelerator parallelization point-to-point interco

分类号 TN47 [电子电信—微电子学与固体电子学]

引文网络
相关文献

参考文献11

1Hyung Gyu Lee,Naehyuck Chang,Umit Y. Ogras,Radu Marculescu.On-chip communication architecture exploration[J]. ACM Transactions on Design Automation of Electronic Systems (TODAES) . 2008 (3)
2Sridhara, S.R.,DiRenzo, M.,Lingam, S.,Seok-Jun Lee,Blazquez, R.,Maxey, J.,Ghanem, S.,Yu-Hung Lee,Abdallah, R.,Singh, P.,Goel, M.Microwatt Embedded Processor Platform for Medical System-on-Chip Applications. Solid-State Circuits, IEEE Journal of . 2011
3Joyce Kwong,Anantha P. Chandrakasan.An Energy-Efficient Biomedical Signal Processing Platform. IEEE Journal of Solid State Circuits . 2011
4Fan Zhang,Yanqing Zhang,Jason Silver."A Batterless 19uW MICS/ISM-Band Energy Harvesting Body Area Sensor Node SoC". IEEE International Solid-State Circuits Conference . 2012
5Lee, H.G,Ogras, U.Y,Marculescu, R.Design space exploration and prototyping for on-chipmultimedia applications. Design Automation Conference,43rd ACM/IEEE . 2006
6Davide Bertozzi,Antoine Jalabert,Srinivasan Murali,Rutuparna Tamhankar,Stergios Stergiou,Luca Benini,Giovanni De Micheli.NoC Synthesis Flow for Customized Domain Specific Multiprocessor Systems-on-Chip. IEEE Transactions on Parallel and Distributed Systems . 2005
7Hempstead, Mark,Brooks, David,Wei, Gu-Yeon.An accelerator-based wireless sensor network processor in 130 nm CMOS. IEEE Journal on Emerging and Selected Topics in Circuits and Systems . 2011
8C J Alpert,S S Sapatnekar,D Keller,G E Tellez,L Reddy.GLARE: Global and Local Wiring Aware Routability Evaluation. Design Automation Conference . 2012
9Ahmed S,Wang Z,Klaiber M,Wahl S,Wroblewski M,Simon S.Parallel hardware architecture for JPEG-LS based on domain decomposition. Applications of Digital Image Processing Xxxv . 2012
10S. Pasricha,N. Dutt,M. Ben-Romdhane."Constraint-driven bus matrix synthesis for MPSoC,". Asia and South Pacific Design Automation Conference ASPDAC 2006 . 2006

1李强,单洪.增强无线局域网的安全性能[J].通讯世界,2003,9(4):24-24. 被引量：1
2YIN ShouYi1,2,LIU LeiBo1,2 & WEI ShaoJun1,2 1 Institute of Microelectronics,Tsinghua University,Beijing 100084,China,2 National Laboratory for Information Science and Technology,Tsinghua University,Beijing 100084,China.Buffer planning for application-specific networks-on-chip design[J].Science in China(Series F),2009,52(4):547-558. 被引量：2
3Murali Krishnan Elumalai,Gangadharan Esakki,Nirmal Kumar.Parallelization of H.264 Encoder Using FPGA Based Symmetric Multi-Core Processors[J].通讯和计算机（中英文版）,2011,8(6):476-482.
4Hua Xu,Wei Wan,Wei Wang,Jun Wang,Jiadong Yang,Yun Wen.Comparison of Parallelization Strategies for Min-Sum Decoding of Irregular LDPC Codes[J].Tsinghua Science and Technology,2013,18(6):577-587. 被引量：1
5PANG Yi HU WeiDong SUN LiFeng YANG ShiQiang.Adaptive data-driven parallelization of multi-view video coding on multi-core processor[J].Science in China(Series F),2009,52(2):195-205. 被引量：2
6在新兴异构SoCs上集成多种系统[J].汽车零部件,2014(10):16-16.
7飞思卡尔推出QorIQ Qonverge无线基站处理器产品系列的第一批产品[J].半导体技术,2011,36(10):814-814.
8矫逸书,周玉梅,蒋见花,吴斌.A 0.5-1.7 GHz low phase noise ring-oscillator-based PLL for mixed-signal SoCs[J].Journal of Semiconductors,2010,31(9):74-78.
9Khaled Grati,Nadia Khouja,Bertrand Le Gal,Adel Ghazel.High Level Design Flow Targeting Real Multistandard Circuit Designer Requirements[J].通讯和计算机（中英文版）,2011,8(5):335-346.
10陈威旋.浅谈内部通话系统的发展及实践心得[J].信息通信,2012,25(3):228-229. 被引量：4

Tsinghua Science and Technology

2015年第6期

浏览历史

内容加载中请稍等...

Simultaneous Accelerator Parallelization and Point-to-Point Interconnect Insertion for Bus-Based Embedded SoCs

参考文献11

相关作者

相关机构

相关主题

浏览历史