Given that the concurrent L1-minimization(L1-min)problem is often required in some real applications,we investigate how to solve it in parallel on GPUs in this paper.First,we propose a novel self-adaptive warp impleme...Given that the concurrent L1-minimization(L1-min)problem is often required in some real applications,we investigate how to solve it in parallel on GPUs in this paper.First,we propose a novel self-adaptive warp implementation of the matrix-vector multiplication(Ax)and a novel self-adaptive thread implementation of the matrix-vector multiplication(ATx),respectively,on the GPU.The vector-operation and inner-product decision trees are adopted to choose the optimal vector-operation and inner-product kernels for vectors of any size.Second,based on the above proposed kernels,the iterative shrinkage-thresholding algorithm is utilized to present two concurrent L1-min solvers from the perspective of the streams and the thread blocks on a GPU,and optimize their performance by using the new features of GPU such as the shuffle instruction and the read-only data cache.Finally,we design a concurrent L1-min solver on multiple GPUs.The experimental results have validated the high effectiveness and good performance of our proposed methods.展开更多
The scattering of the open cavity filled with the inhomogeneous media is studied.The problem is discretized with a fourth order finite difference scheme and the immersed interfacemethod,resulting in a linear system of...The scattering of the open cavity filled with the inhomogeneous media is studied.The problem is discretized with a fourth order finite difference scheme and the immersed interfacemethod,resulting in a linear system of equations with the high order accurate solutions in the whole computational domain.To solve the system of equations,we design an efficient iterative solver,which is based on the fast Fourier transformation,and provides an ideal preconditioner for Krylov subspace method.Numerical experiments demonstrate the capability of the proposed fast high order iterative solver.展开更多
基金The research has been supported by the Natural Science Foundation of China under great number 61872422the Natural Science Foundation of Zhejiang Province,China under great number LY19F020028.
文摘Given that the concurrent L1-minimization(L1-min)problem is often required in some real applications,we investigate how to solve it in parallel on GPUs in this paper.First,we propose a novel self-adaptive warp implementation of the matrix-vector multiplication(Ax)and a novel self-adaptive thread implementation of the matrix-vector multiplication(ATx),respectively,on the GPU.The vector-operation and inner-product decision trees are adopted to choose the optimal vector-operation and inner-product kernels for vectors of any size.Second,based on the above proposed kernels,the iterative shrinkage-thresholding algorithm is utilized to present two concurrent L1-min solvers from the perspective of the streams and the thread blocks on a GPU,and optimize their performance by using the new features of GPU such as the shuffle instruction and the read-only data cache.Finally,we design a concurrent L1-min solver on multiple GPUs.The experimental results have validated the high effectiveness and good performance of our proposed methods.
基金The author is grateful for Professor Tao Tang and Dr.Zhonghua Qiao for many helpful and fruitful discussions,and would like to thank Professor Weiwei Sun for constructive suggestions。
文摘The scattering of the open cavity filled with the inhomogeneous media is studied.The problem is discretized with a fourth order finite difference scheme and the immersed interfacemethod,resulting in a linear system of equations with the high order accurate solutions in the whole computational domain.To solve the system of equations,we design an efficient iterative solver,which is based on the fast Fourier transformation,and provides an ideal preconditioner for Krylov subspace method.Numerical experiments demonstrate the capability of the proposed fast high order iterative solver.