摘要
随着纳米工艺的不断改进,温度对漏电流功耗和热导的影响日益显著.考虑温度/功耗/热导相互作用的3D芯片热分析需要采用迭代方法对温度进行精确求解,即先用功耗密度向量和热导矩阵来求解温度向量,再用求解出来的温度向量来刷新功耗密度向量和热导矩阵.为了提高3D芯片热分析的效率,本文以一个设定温度值下的均匀热导矩阵作为预条件,先提出了一种双循环、内循环低迭代次数的高效求解算法TPG-FTCG.鉴于TPG-FTCG具有超快的内循环收敛速度,本文省去了TPG-FTCG算法的内循环部分,提出了一种单循环、低迭代次数的TPG求解算法TPG-Sli.基于GPU(Graphics Processing Unit)并行加速技术,本文编写并改进了TPG-Sli的GPU加速算法.实验数据表明:与采用经典高效的ICCG算法进行3D芯片热分析的TPG-ICCG算法相比,在足够小的误差范围内,TPG-Sli的GPU加速算法可以获得120倍的速度提升.
With the improvement of the nanometer technology,the influences among temperature,leakage power and heat conductance become increasingly significant and it should be taken into account in 3 D chip comprehensive thermal anal-ysis to solve the accurate temperature based on the iterative solution.The comprehensive thermal analysis method uses the nodal power density vector and the heat conductance matrix to solve the nodal temperature vector,and then,refreshes power density and heat conductance with the obtained nodal temperature.In order to improve the efficiency of 3D chip comprehen-sive thermal analysis,this work uses the heat conductance matrix as the precondition under a setting temperature.Then it pro-poses an efficient algorithm TPG-FTCG(CG with the Fast Transform-based Preconditioner)which has double-loop and low-er inner-loop iterations.According to TPG-FTCG’s fast inner-loop convergence rate,this work removes TPG-FTCG’s in-ner-loop part then proposes a more efficient TPG solving algorithm TPG-Sli(Single-loop iterative),which only has single-loop iterative and fewer iterations.Based on the GPU parallel computing,this work compiles and refines TPG-Sli’s GPU-parallel-computing algorithm.Experimental results demonstrate that:On the premise of precision losing,the TPG-Sli’s GPU algorithm can achieve about 120X speedup compared with the TPG-ICCG algorithm,which uses the classical and efficient ICCG to deal with the 3 D chip comprehensive thermal analysis.
出处
《电子学报》
EI
CAS
CSCD
北大核心
2016年第6期1300-1306,共7页
Acta Electronica Sinica
基金
国家自然科学基金(No.51331002)
关键词
算法
热分析
快速傅里叶变换
GPU并行
algorithm
thermal analysis
Fast Fourier transform
GPU parallel computing