期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
Simultaneous Accelerator Parallelization and Point-to-Point Interconnect Insertion for Bus-Based Embedded SoCs
1
作者 Daming Zhang Yongpan Liu +2 位作者 Shuangchen Li Tongda Wu Huazhong Yang 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2015年第6期644-660,共17页
As performance requirements for bus-based embedded System-on-Chips(So Cs) increase, more and more on-chip application-specific hardware accelerators(e.g., filters, FFTs, JPEG encoders, GSMs, and AES encoders) are bein... As performance requirements for bus-based embedded System-on-Chips(So Cs) increase, more and more on-chip application-specific hardware accelerators(e.g., filters, FFTs, JPEG encoders, GSMs, and AES encoders) are being integrated into their designs. These accelerators require system-level tradeoffs among performance, area, and scalability. Accelerator parallelization and Point-to-Point(P2P) interconnect insertion are two effective system-level adjustments. The former helps to boost the computing performance at the cost of area,while the latter provides higher bandwidth at the cost of routability. What’s more, they interact with each other. This paper proposes a design flow to optimize accelerator parallelization and P2 P interconnect insertion simultaneously.To explore the huge optimization space, we develop an effective algorithm, whose goal is to reduce total So C latency under the constraints of So C area and total P2 P wire length. Experimental results show that the performance difference between our proposed algorithm and the optimal results is only 2.33% on average, while the running time of the algorithm is less than 17 s. 展开更多
关键词 accelerator parallelization point-to-point interco
原文传递
PPAA:a parallel primitive assembly accelerator in graphics processor
2
作者 Deng Junyong Xie Xiaoyan +1 位作者 Liu Yang Tian Pu 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2020年第2期65-71,共7页
Primitive assembly is an inevitable procedure of graphics rendering which performs the objects preparation for the following steps,however,the conventional approaches suffer from some issues,such as the missing of sur... Primitive assembly is an inevitable procedure of graphics rendering which performs the objects preparation for the following steps,however,the conventional approaches suffer from some issues,such as the missing of surface attribute,mismatch of color mode for clipped primitives,and performance bottleneck of rendering pipeline.This paper takes all these issues into considerations,and proposes a parallel primitive assembly accelerator(PPAA)which can solve not only the functional problems but also improve the shading performance.The register transfer level(RTL)circuit is designed and the detailed approach is presented.The prototype systems are implemented on Xilinx field programmable gate array(FPGA)XC6 VLX550 T and Altera FPGA EP2 C70 F896 C6.The experimental results show that PPAA can accomplish the assembly tasks correctly and with higher performance of 1.5x and 2.5x of two previous implementations.For the most frequently independent primitives,the PPAA can efficiently enhance the throughput by squeezing out the pipeline bubbles and by balancing the pipeline stages. 展开更多
关键词 primitive assembly parallel accelerator primitive characteristics graphics processor
原文传递
GPU-accelerated vector-form particle-element method for 3D elastoplastic contact of structures
3
作者 Wei WANG Yanfeng ZHENG +2 位作者 Jingzhe TANG Chao YANG Yaozhi LUO 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2023年第12期1120-1130,共11页
A graphics processing unit(GPU)-accelerated vector-form particle-element method,i.e.,the finite particle method(FPM),is proposed for 3D elastoplastic contact of structures involving strong nonlinearities and computati... A graphics processing unit(GPU)-accelerated vector-form particle-element method,i.e.,the finite particle method(FPM),is proposed for 3D elastoplastic contact of structures involving strong nonlinearities and computationally expensive contact calculations.A hexahedral FPM element with reduced integration and anti-hourglass is developed to model structural elastoplastic behaviors.The 3D space containing contact surfaces is decomposed into cubic cells and the contact search is performed between adjacent cells to improve search efficiency.A connected list data structure is used for storing contact particles to facilitate the parallel contact search procedure.The contact constraints are enforced by explicitly applying normal and tangential contact forces to the contact particles.The proposed method is fully accelerated by GPU-based parallel computing.After verification,the performance of the proposed method is compared with the serial finite element code Abaqus/Explicit by testing two large-scale contact examples.The maximum speedup of the proposed method over Abaqus/Explicit is approximately 80 for the overall computation and 340 for contact calculations.Therefore,the proposed method is shown to be effective and efficient. 展开更多
关键词 Graphics processing unit(GPU) Parallel acceleration Elastoplastic contact Contact search Finite particle method(FPM)
原文传递
PARALLEL QUASI-CHEBYSHEV ACCELERATION TO NONOVERLAPPING MULTISPLITTING ITERATIVE METHODS BASED ON OPTIMIZATION 被引量:2
4
作者 Ruiping Wen GuoyanMeng Chuanlong Wang 《Journal of Computational Mathematics》 SCIE CSCD 2014年第3期284-296,共13页
In this paper, we present a parallel quasi-Chebyshev acceleration applied to the nonover- lapping multisplitting iterative method for the linear systems when the coefficient matrix is either an H-matrix or a symmetric... In this paper, we present a parallel quasi-Chebyshev acceleration applied to the nonover- lapping multisplitting iterative method for the linear systems when the coefficient matrix is either an H-matrix or a symmetric positive definite matrix. First, m parallel iterations are implemented in m different processors. Second, based on l1-norm or l2-norm, the m opti- mization models are parallelly treated in m different processors. The convergence theories are established for the parallel quasi-Chebyshev accelerated method. Finally, the numeri- cal examples show that the parallel quasi-Chebyshev technique can significantly accelerate the nonoverlapping multisplitting iterative method. 展开更多
关键词 Parallel quasi-Chebyshev acceleration Nonoverlapping multisplitting iterative method Convergence optimization.
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部