摘要
功耗作为大型SoC芯片的性能功耗面积(PPA)三要素之一,已经变得越来越重要。尤其是当主流设计平台已经发展到了7 nm以下。AI芯片一般会有多个核心并行执行高性能计算任务。这种行为会产生巨大的功耗。因此在AI芯片的设计过程中,功耗优化变得尤为重要。利用一个典型的功耗用例波形或者一组波形,可以从RTL进来开始功耗优化。基本的方式是借助Joules-replay实现基于RTL波形产生相对应的网表波形。在Genus的syn-gen、syn-map、syn-opt三个综合阶段,都可以加入Joules-replay,并且产生和综合网表相对应的波形,用于Innovus PR阶段进一步地进行功耗优化。在Innovus中实现Place和Routing也分为3个阶段:place_opt、cts_opt和route_opt。同样每一步都可以引入Joules-replay来生成功耗优化所需的网表波形。最终在Tempus timing signoff的环境中,再次引入波形进行功耗优化。基于上面的一系列各个节点的精确功耗优化该设计可以获得10%以上的功耗节省。此时再结合multi-bit技术,最终可以获得21%的功耗节省。
Power as one part of PPA(Performance,Power and Area)becomes more and more important in large SoC chips,especially under 7 nm technology.AI chips schedule multi-cores in parallel for specific application scenario,which lead to very large power consumption.Power optimization for each core is highest priority for an AI chip design.With a typical power scenario or multi-scenario grouped together,we can do power optimization from RTL synthesis to GDS.The basic flow is using Joules-replay to convert RTL activity file(time-based formats-VCD/FSDB/SHM/PHY)to gate level activity file.Synthesis with Genus has 3 steps:syn-gen,syn-map and syn-opt,Joules-replay is added after each step,and the replayed activity file will be used in power optimization in next step,which increase power estimation accuracy.Innovus place and route also has 3 main steps:place-opt,CTS-opt and route-opt,same flow with Joules-replay can be involved after each step,and it generates stimulus activity for next step.At final timing signoff stage,we use post-sim activity for power opt in Tempus.With this full flow power optimization flow,we can achieve more than 10%power reduction,combined with MBFF(Multi-Bit Flip-Flop)optimization,we can get 21%power reduction finally.
作者
顾东华
曾智勇
余金金
黄徐辉
朱嘉骏
何湘君
陈泽发
Gu Donghua;Zeng Zhiyong;Yu Jinjin;Huang Xuhui;Zhu Jiajun;He Xiangjun;Chen Zefa(Enflame Technology,Shanghai 200000,China;Cadence Design System,Inc.,Shanghai 200000,China)
出处
《电子技术应用》
2022年第8期65-69,共5页
Application of Electronic Technique