期刊文献+

利于GPU计算具有线性并行度的P/G网SOR求解算法 被引量:3

SOR-Based P/G Solving Algorithm of Linear Parallelism for GPU Computing
下载PDF
导出
摘要 近年来电子设计自动化(EDA)研究人员尝试利用图形处理器(graphic processing unit,GPU)提供的高性能计算能力对IC参数分析进行加速研究.为了利用GPU进行电源线/地线网络(power/ground network,P/G网)快速分析,设计了一种基于经典的连续过松弛(successive over-relaxation,SOR)算法的高效P/G网分析并行算法.基于GPU并行计算加速原理,此算法进行了如下改进:1)采用红-黑次序的松弛策略.将所有的节点分为红黑两类,红色节点的所有邻点只有黑色节点、黑色节点的所有邻点只有红色节点,红色节点与黑色节点交替松弛,保证了GPU并行计算中的数据一致性.对于具有N个节点的P/G网而言,一次红色节点或黑色节点松弛可以同时对N/2个节点进行松弛操作,即理论上可以同时启动N?2个并行线程.2)优化数据结构.实现了对数据空间的合并访问,以保证对GPU全局存储空间的最优访问.3)在共享存储器内通过并行归约对松弛标记进行快速统计,同时利用zero-copy技术进行松弛标记的快速拷贝,以快速决定是否继续松弛.大量的实验结果表明:与单线程的CPU程序相比,此算法的加速倍数随GPU所提供物理线程的数目增加而线性增加,可以获得最大242倍的加速效果,是目前EDA研究领域中加速效果最好的GPU算法. Recently some EDA researchers try using the high computing performance of the graphic processing unit (GPU) to speed up IC parameter analysis. In order to apply GPU for efficient analyses of power/ground(P/G) networks, a novel efficient parallel analysis algorithm is proposed based on the traditional successive over-relaxation (SOR) algorithm. According to the accelerating principles of GPU parallel computing, the algorithm has made the following improvements. 1) Use odd/even based red/black strategy to classify all nodes so that all neighbors of a red node are black ones and a black node has all red neighbors. Red nodes and black nodes are relaxed alternatively for data consistency of GPU computing. As for a P/G network of N nodes, one relaxation process of red or black nodes will relax N/2 nodes, which means that the red/black SOR will implement N/2 threads at one time from the viewpoint of theory. 2) Optimize the data structure to implement data coalesced access. 3) Local shared memory is used to parallel reduce relaxation ending flags and the compacted flags are zero copied to the main memory of the computer system at high speed. In turn, CPU can decide fast whether to activate the next round of relaxation kernels or end the relaxation. A large number of experiments demonstrate that compared with the single CPU thread, the speedup times of our algorithm increase linearly as GPU physical threads increase, and the algorithm can provide 242X speedup at maximum. Thus, according to our best knowledge, the algorithm provides the best GPU accelerating efficiency among all present EDA researches.
出处 《计算机研究与发展》 EI CSCD 北大核心 2013年第7期1491-1500,共10页 Journal of Computer Research and Development
基金 国家“八六三”高技术研究发展计划基金项目(2009AA01Z126) 国家自然科学基金项目(61274033,61271198,61171014) 中央高校基本科研业务费专项资金项目(2010GK182)
关键词 图形处理器 连续过松弛算法 统一计算设备架构 并行算法 电源线 地线网络(P G网) graphic processing unit (GPU) successive over-relaxation (SOR) algorithm computeunified device architecture (CUDA) parallel computing power/ground network
  • 相关文献

参考文献16

  • 1Wilson L,Mangum S. International Technology Roadmap for Semiconductors (ITRS) [OL].[2011-08 -01]. http.-// public,itrs. net/.
  • 2骆祖莹.芯片功耗与工艺参数变化:下一代集成电路设计的两大挑战[J].计算机学报,2007,30(7):1054-1063. 被引量:17
  • 3骆祖莹.电热分析研究的现状与展望[J].计算机辅助设计与图形学学报,2009,21(9):1203-1211. 被引量:9
  • 4Zhong Y,Wong M D F. Fast algorithms for IR drop analysis in large power grid [C] //Proc of IEEE/ACM ICCAD 2005. Piscataway,NJ:IEEE,2005:351-357.
  • 5LUO Zuying,CAI Yici,Sheldon X.-D Tan,HONG Xianlong,WANG Xiaoyi,PAN Zhu,FU Jingjing.Time-domain analysis methodology for large-scale RLC circuits and its applications[J].Science in China(Series F),2006,49(5):665-680. 被引量:13
  • 6Luo Zuying,Tan S X D,Fan J. Single-node statistical 3D thermal analysis considering electro-thermal coupling [C] // Proc of 2009 IEEE Int Syrup on Circuit and System. Piscataway,NJ:IEEE,2009:1289-1292.
  • 7Shi Jin,Cai Yici,Hou Wenting,et al. GPU friendly fast poisson solver for structured power grid network analysis [C] //Proc of the 46th ACM/IEEE Design Automation Conf. New York:ACM,2009:178-183.
  • 8Feng Z,Li P. Parallel multigrid preconditioning on graphics processing units (GPUs) for robust power grid analysis [C] //Proc of the 47th ACM/IEEE Design Automation Conf. New York:ACM,2010:661-666.
  • 9Feng Z,I.i P. Fast thermal analysis on GPU for 3D-ICs with integrated micro channel cooling [C] //Proc of IEEE/ACM ICCAI) 2010. Piscataway,NJ:IEEE,2010:551-555.
  • 10Luo L J,Wong M,Hwu W M. An effective GPUimplementation of breadth-first search [C] //Proc of the 47thACM/IEEE Design Automation Conf. New York:ACM,2010:52-55.

二级参考文献8

共引文献25

同被引文献43

  • 1周海芳,赵进.基于GPU的遥感图像配准并行程序设计与存储优化[J].计算机研究与发展,2012,49(S1):281-286. 被引量:18
  • 2詹海生,王启户.一种自适应字长的中文词库的构建方法[J].计算机研究与发展,2011,48(S1):382-386. 被引量:1
  • 3刘杰,刘兴平,迟利华,胡庆丰.一种改进的适合并行计算的共轭剩余算法[J].计算机学报,2006,29(3):495-499. 被引量:5
  • 4Hung Che-Lun, Lin Yaw-Ling, Li Kuan-Ching, et al. Ef-ficient GPGPU-based parallel packet classification [ C ]// 2011 IEEE 10th International Conference on Trust, Securi- ty and Privacy in Computing and Communications. 2011 : 1367-1374.
  • 5Alastair Nottingham, Barry Irwin. GPU packet classifica- tion using OpenCL: A consideration of viable classification methods[ C ]// Proceedings of the 2009 Annual Research Conference of the South African Institute of Computer Sci- entists and Information Technologists. 2009:160-169.
  • 6Alastair Nottingham, Barry Irwin. Parallel packet classifi- cation using GPU co-processors [ C ].// Proceedings of the 2010 Annual Research Conference of the South African In- stitute of Computer Scientists and Information Technolo- gists. 2010:231-241.
  • 7Sangjin Han, Keon Jang, KyongSoo Park, et al. Packet- Shader : A GPU-accelerated software router[ C ]//Proceed- ing of the ACM SIGCOMM 2010 Conference. 2010: 195- 206.
  • 8Kang Kang, Yangdong Steve Deng. Scalable packet classi- fication via GPU metaprogramming[ C ]//Design, Automa- tion & Test in Europe Conference & Exhibition. 2011:1-4.
  • 9Shane Ryoo, Christopher I Rodrigues, Sam S Stone, et al. Program optimization space pruning for a multithreaded GPU[ C]//Proceedings of the 6th Annual IEEE/ACM In- ternational Symposium on Code Generation and Optimiza- tion. 2008 : 195-204.
  • 10刘胤,杨世平.基于RFC算法的快速多维数据包分类算法[J].计算机工程,2008,34(6):95-97. 被引量:7

引证文献3

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部