利于GPU计算具有线性并行度的P／G网SOR求解算法被引量：3

SOR-Based P/G Solving Algorithm of Linear Parallelism for GPU Computing

下载PDF

导出

摘要近年来电子设计自动化(EDA)研究人员尝试利用图形处理器(graphic processing unit,GPU)提供的高性能计算能力对IC参数分析进行加速研究.为了利用GPU进行电源线/地线网络(power/ground network,P/G网)快速分析,设计了一种基于经典的连续过松弛(successive over-relaxation,SOR)算法的高效P/G网分析并行算法.基于GPU并行计算加速原理,此算法进行了如下改进:1)采用红-黑次序的松弛策略.将所有的节点分为红黑两类,红色节点的所有邻点只有黑色节点、黑色节点的所有邻点只有红色节点,红色节点与黑色节点交替松弛,保证了GPU并行计算中的数据一致性.对于具有N个节点的P/G网而言,一次红色节点或黑色节点松弛可以同时对N/2个节点进行松弛操作,即理论上可以同时启动N?2个并行线程.2)优化数据结构.实现了对数据空间的合并访问,以保证对GPU全局存储空间的最优访问.3)在共享存储器内通过并行归约对松弛标记进行快速统计,同时利用zero-copy技术进行松弛标记的快速拷贝,以快速决定是否继续松弛.大量的实验结果表明:与单线程的CPU程序相比,此算法的加速倍数随GPU所提供物理线程的数目增加而线性增加,可以获得最大242倍的加速效果,是目前EDA研究领域中加速效果最好的GPU算法. Recently some EDA researchers try using the high computing performance of the graphic processing unit （GPU） to speed up IC parameter analysis. In order to apply GPU for efficient analyses of power/ground（P/G） networks, a novel efficient parallel analysis algorithm is proposed based on the traditional successive over-relaxation （SOR） algorithm. According to the accelerating principles of GPU parallel computing, the algorithm has made the following improvements. 1） Use odd/even based red/black strategy to classify all nodes so that all neighbors of a red node are black ones and a black node has all red neighbors. Red nodes and black nodes are relaxed alternatively for data consistency of GPU computing. As for a P/G network of N nodes, one relaxation process of red or black nodes will relax N/2 nodes, which means that the red/black SOR will implement N/2 threads at one time from the viewpoint of theory. 2） Optimize the data structure to implement data coalesced access. 3） Local shared memory is used to parallel reduce relaxation ending flags and the compacted flags are zero copied to the main memory of the computer system at high speed. In turn, CPU can decide fast whether to activate the next round of relaxation kernels or end the relaxation. A large number of experiments demonstrate that compared with the single CPU thread, the speedup times of our algorithm increase linearly as GPU physical threads increase, and the algorithm can provide 242X speedup at maximum. Thus, according to our best knowledge, the algorithm provides the best GPU accelerating efficiency among all present EDA researches.

作者唐亮骆祖莹赵国兴杨旭

机构地区北京师范大学信息科学与技术学院北京

出处《计算机研究与发展》 EI CSCD 北大核心 2013年第7期1491-1500,共10页 Journal of Computer Research and Development

基金国家“八六三”高技术研究发展计划基金项目(2009AA01Z126) 国家自然科学基金项目(61274033,61271198,61171014) 中央高校基本科研业务费专项资金项目(2010GK182)

关键词图形处理器连续过松弛算法统一计算设备架构并行算法电源线地线网络(P G网) graphic processing unit （GPU） successive over-relaxation （SOR） algorithm computeunified device architecture （CUDA） parallel computing power/ground network

分类号 TP391.7 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献16

1Wilson L,Mangum S. International Technology Roadmap for Semiconductors (ITRS) [OL].[2011-08 -01]. http.-// public,itrs. net/.
2骆祖莹.芯片功耗与工艺参数变化:下一代集成电路设计的两大挑战[J].计算机学报,2007,30(7):1054-1063. 被引量：17
3骆祖莹.电热分析研究的现状与展望[J].计算机辅助设计与图形学学报,2009,21(9):1203-1211. 被引量：9
4Zhong Y,Wong M D F. Fast algorithms for IR drop analysis in large power grid [C] //Proc of IEEE/ACM ICCAD 2005. Piscataway,NJ:IEEE,2005:351-357.
5LUO Zuying,CAI Yici,Sheldon X.-D Tan,HONG Xianlong,WANG Xiaoyi,PAN Zhu,FU Jingjing.Time-domain analysis methodology for large-scale RLC circuits and its applications[J].Science in China(Series F),2006,49(5):665-680. 被引量：13
6Luo Zuying,Tan S X D,Fan J. Single-node statistical 3D thermal analysis considering electro-thermal coupling [C] // Proc of 2009 IEEE Int Syrup on Circuit and System. Piscataway,NJ:IEEE,2009:1289-1292.
7Shi Jin,Cai Yici,Hou Wenting,et al. GPU friendly fast poisson solver for structured power grid network analysis [C] //Proc of the 46th ACM/IEEE Design Automation Conf. New York:ACM,2009:178-183.
8Feng Z,Li P. Parallel multigrid preconditioning on graphics processing units (GPUs) for robust power grid analysis [C] //Proc of the 47th ACM/IEEE Design Automation Conf. New York:ACM,2010:661-666.
9Feng Z,I.i P. Fast thermal analysis on GPU for 3D-ICs with integrated micro channel cooling [C] //Proc of IEEE/ACM ICCAI) 2010. Piscataway,NJ:IEEE,2010:551-555.
10Luo L J,Wong M,Hwu W M. An effective GPUimplementation of breadth-first search [C] //Proc of the 47thACM/IEEE Design Automation Conf. New York:ACM,2010:52-55.

二级参考文献8

1蔡懿慈,潘著,骆祖莹,洪先龙,Sheldon,X-D.Tan.基于几何多网格的RLC电源网络的瞬态模拟[J].计算机辅助设计与图形学学报,2005,17(4):657-662. 被引量：5
2LUO Zuying,CAI Yici,Sheldon X.-D Tan,HONG Xianlong,WANG Xiaoyi,PAN Zhu,FU Jingjing.Time-domain analysis methodology for large-scale RLC circuits and its applications[J].Science in China(Series F),2006,49(5):665-680. 被引量：13
3骆祖莹,钟燕清.VLSI晶体管级时延模拟方法[J].计算机辅助设计与图形学学报,2006,18(12):1855-1860. 被引量：5
4骆祖莹.芯片功耗与工艺参数变化:下一代集成电路设计的两大挑战[J].计算机学报,2007,30(7):1054-1063. 被引量：17
5Zuying Luo,Sheldon X.D. Tan.Efficient statistical analysis method of power/ground (P/G) network[J].Progress in Natural Science:Materials International,2008,18(2):189-196. 被引量：5
6骆祖莹,闵应骅,杨士元,李晓维.The monotonic increasing relationship between average powers of CMOS VLSI circuits with and without delay and its applications[J].Science in China(Series F),2002,45(6):401-405. 被引量：1
7徐勇军,骆祖莹,李晓维,李华伟.双阈值CMOS电路静态功耗优化[J].计算机辅助设计与图形学学报,2003,15(3):264-269. 被引量：8
8徐勇军,陈治国,骆祖莹,李晓维.深亚微米CMOS电路漏电流快速模拟器[J].计算机研究与发展,2004,41(5):880-885. 被引量：3

共引文献25

1骆祖莹.芯片功耗与工艺参数变化:下一代集成电路设计的两大挑战[J].计算机学报,2007,30(7):1054-1063. 被引量：17
2骆祖莹.电源线/地线网络高效统计分析方法[J].自然科学进展,2007,17(9):1287-1294. 被引量：1
3骆祖莹,潘月斗,余先川.电源线/地线网络单点SOR统计分析方法[J].电子学报,2007,35(11):2043-2049.
4Zuying Luo,Sheldon X.D. Tan.Efficient statistical analysis method of power/ground (P/G) network[J].Progress in Natural Science:Materials International,2008,18(2):189-196. 被引量：5
5胡靖,马光胜,李东海,冯刚.基于相关系数-海森矩阵的漏功耗分析[J].计算机辅助设计与图形学学报,2008,20(5):598-604.
6丁家峰,应海涛.非线性电路分析系统的设计及仿真研究[J].计算机仿真,2008,25(12):305-307. 被引量：1
7骆祖莹,张于彬,余先川.电源线/地线网络开路电阻单故障分析方法[J].计算机研究与发展,2009,46(7):1234-1240. 被引量：3
8骆祖莹.电热分析研究的现状与展望[J].计算机辅助设计与图形学学报,2009,21(9):1203-1211. 被引量：9
9骆祖莹,张昌明,邢霄雄,甯青松,吴文川.基于电路压缩的单开路故障快速分析算法[J].高技术通讯,2009,19(11):1170-1175.
10王昌龙.嵌入式软件系统的节能策略研究[J].现代电子技术,2009,32(22):39-41. 被引量：2

同被引文献43

1周海芳,赵进.基于GPU的遥感图像配准并行程序设计与存储优化[J].计算机研究与发展,2012,49(S1):281-286. 被引量：18
2詹海生,王启户.一种自适应字长的中文词库的构建方法[J].计算机研究与发展,2011,48(S1):382-386. 被引量：1
3刘杰,刘兴平,迟利华,胡庆丰.一种改进的适合并行计算的共轭剩余算法[J].计算机学报,2006,29(3):495-499. 被引量：5
4Hung Che-Lun, Lin Yaw-Ling, Li Kuan-Ching, et al. Ef-ficient GPGPU-based parallel packet classification [ C ]// 2011 IEEE 10th International Conference on Trust, Securi- ty and Privacy in Computing and Communications. 2011 : 1367-1374.
5Alastair Nottingham, Barry Irwin. GPU packet classifica- tion using OpenCL: A consideration of viable classification methods[ C ]// Proceedings of the 2009 Annual Research Conference of the South African Institute of Computer Sci- entists and Information Technologists. 2009:160-169.
6Alastair Nottingham, Barry Irwin. Parallel packet classifi- cation using GPU co-processors [ C ].// Proceedings of the 2010 Annual Research Conference of the South African In- stitute of Computer Scientists and Information Technolo- gists. 2010:231-241.
7Sangjin Han, Keon Jang, KyongSoo Park, et al. Packet- Shader : A GPU-accelerated software router[ C ]//Proceed- ing of the ACM SIGCOMM 2010 Conference. 2010: 195- 206.
8Kang Kang, Yangdong Steve Deng. Scalable packet classi- fication via GPU metaprogramming[ C ]//Design, Automa- tion & Test in Europe Conference & Exhibition. 2011:1-4.
9Shane Ryoo, Christopher I Rodrigues, Sam S Stone, et al. Program optimization space pruning for a multithreaded GPU[ C]//Proceedings of the 6th Annual IEEE/ACM In- ternational Symposium on Code Generation and Optimiza- tion. 2008 : 195-204.
10刘胤,杨世平.基于RFC算法的快速多维数据包分类算法[J].计算机工程,2008,34(6):95-97. 被引量：7

引证文献3

1张唯唯,张玉洁.基于GPU的并行报文分类方法[J].计算机与现代化,2014(11):9-14. 被引量：3
2黄敏,丁萍,罗海飚.共轭梯度法在GPU及Xeon Phi下的并行优化及比较[J].华南理工大学学报（自然科学版）,2015,43(11):35-46. 被引量：1
3王玮,苏琦,刘荫,周伟,于展鹏,穆林.基于云存储的异构海量数据搜索平台设计[J].信息技术,2017,41(6):166-169. 被引量：2

二级引证文献6

1李廷凯,龚俊,赖文娟.探究以GPGPU为基础的数字图像并行化预处理[J].信息通信,2018,0(1):8-9.
2桑海翎,郭文忠.基于海量异构数据索引语义查询的关键模型研究[J].福州大学学报（自然科学版）,2018,46(3):324-329. 被引量：4
3唐志斌,曾学文,陈晓.基于维度分解的多核并行网包分类算法[J].计算机与现代化,2020,0(2):1-7.
4陆荣秀,刘淑丽,杨辉,朱建勇.稀土萃取过程的广义预测解耦控制[J].控制工程,2021,28(1):1-7. 被引量：5
5张婷婷,章熙海,王冬辰.基于异构数据融合的地震综合数据分析系统设计[J].电子设计工程,2022,30(17):132-136. 被引量：1
6沈金志.一种带通配符的报文分类方法[J].电脑编程技巧与维护,2024(9):24-26.

1唐亮,潘月斗,王嘉琪,骆祖莹.片上P/G网求解算法及其GPU上的并行化[J].计算机辅助设计与图形学学报,2014,26(7):1203-1210.
2骆祖莹,赵国兴,周金和.ECO布局中的电源线/地线网络局部SOR分析方法[J].计算机辅助设计与图形学学报,2010,22(6):921-926. 被引量：1
3骆祖莹,潘月斗,余先川.电源线/地线网络单点SOR统计分析方法[J].电子学报,2007,35(11):2043-2049.
4骆祖莹,张于彬,余先川.电源线/地线网络开路电阻单故障分析方法[J].计算机研究与发展,2009,46(7):1234-1240. 被引量：3
5苏浩航,张义门,张玉明,满进财.基于层次化随机游走算法的静态P/G网分析[J].电子器件,2007,30(3):1079-1083.
6于泓,洪先龙,乔长阁,蔡懿慈.基于树型结构的电源—地线网络设计方法[J].清华大学学报（自然科学版）,1998,38(S1):67-70.
7王一,杨海钢,余乐,孙嘉斌.多层金属电源线地线网络拓扑结构的IR-drop分析方法[J].电子学报,2015,43(12):2542-2546. 被引量：1
8宗俊超.浅谈网络信息处理与安全方面的计算机应用[J].科技创新与应用,2013,3(12):50-50. 被引量：10
9孙洪志.快速拷贝FastCopy[J].计算机应用文摘,2008(4):58-58.
10杨晓波.浅谈网络信息处理与安全方面的计算机应用[J].祖国（建设版）,2014(7):575-575.

计算机研究与发展

2013年第7期

浏览历史

内容加载中请稍等...

利于GPU计算具有线性并行度的P／G网SOR求解算法被引量：3

参考文献16

二级参考文献8

共引文献25

同被引文献43

引证文献3

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

利于GPU计算具有线性并行度的P／G网SOR求解算法 被引量：3

参考文献16

二级参考文献8

共引文献25

同被引文献43

引证文献3

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

利于GPU计算具有线性并行度的P／G网SOR求解算法被引量：3