期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
Convolutional neural network adaptation and optimization method in SIMT computing mode
1
作者 Feng Zhenfu Zhang Yaying +1 位作者 Yang Lele Xing Lidong 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2024年第2期105-112,共8页
For studying and optimizing the performance of general-purpose computing on graphics processing units(GPGPU)based on single instruction multiple threads(SIMT)processor about the neural network application,this work co... For studying and optimizing the performance of general-purpose computing on graphics processing units(GPGPU)based on single instruction multiple threads(SIMT)processor about the neural network application,this work contributes a self-developed SIMT processor named Pomelo and correlated assembly program.The parallel mechanism of SIMT computing mode and self-developed Pomelo processor is briefly introduced.A common convolutional neural network(CNN)is built to verify the compatibility and functionality of the Pomelo processor.CNN computing flow with task level and hardware level optimization is adopted on the Pomelo processor.A specific algorithm for organizing a Z-shaped memory structure is developed,which addresses reducing memory access in mass data computing tasks.Performing the above-combined adaptation and optimization strategy,the experimental result demonstrates that reducing memory access in SIMT computing mode plays a crucial role in improving performance.A 6.52 times performance is achieved on the 4 processing elements case. 展开更多
关键词 parallel computing single instruction multiple threads(SIMT) convolutional neural network(CNN) memory optimization
原文传递
10-Million Atoms Simulation of First-Principle Package LS3DF
2
作者 严昱瑾 李海波 +6 位作者 赵曈 汪林望 石林 刘涛 谭光明 贾伟乐 孙凝晖 《Journal of Computer Science & Technology》 SCIE EI CSCD 2024年第1期45-62,共18页
The growing demand for semiconductor devices simulation poses a big challenge for large-scale electronic structure calculations.Among various methods,the linearly scaling three-dimensional fragment(LS3DF)method exhibi... The growing demand for semiconductor devices simulation poses a big challenge for large-scale electronic structure calculations.Among various methods,the linearly scaling three-dimensional fragment(LS3DF)method exhibits excellent scalability in large-scale simulations.Based on algorithmic and system-level optimizations,we propose a highly scalable and highly efficient implementation of LS3DF on a domestic heterogeneous supercomputer equipped with acceler-ators.In terms of algorithmic optimizations,the original all-band conjugate gradient algorithm is refined to achieve faster convergence,and mixed precision computing is adopted to increase overall efficiency.In terms of system-level optimiza-tions,the original two-layer parallel structure is replaced by a coarse-grained parallel method.Optimization strategies such as multi-stream,kernel fusion,and redundant computation removal are proposed to increase further utilization of the com-putational power provided by the heterogeneous machines.As a result,our optimized LS3DF can scale to a 10-million sili-con atoms system,attaining a peak performance of 34.8 PFLOPS(21.2% of the peak).All the improvements can be adapt-ed to the next-generation supercomputers for larger simulations. 展开更多
关键词 single instruction multiple thread accelerator electronic structure high-performance computing linearly scaling three-dimensional fragment(LS3DF)
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部