为了进一步加快JPEG2000的压缩速度,对JPEG2000压缩标准进行研究,分析得出JPEG2000核心算法离散小波变换(DWT)部分数据之间的独立性适合并行化处理。NVIDIA最新推出的CUDA(计算统一设备架构)是非常适合大规模数据并行计算的软硬件开发...为了进一步加快JPEG2000的压缩速度,对JPEG2000压缩标准进行研究,分析得出JPEG2000核心算法离散小波变换(DWT)部分数据之间的独立性适合并行化处理。NVIDIA最新推出的CUDA(计算统一设备架构)是非常适合大规模数据并行计算的软硬件开发平台。在通用计算图形处理器(General Purpose Graphic Process Unit,GPGPU)上使用CUDA技术实现DWT并行化加速,并针对GPGPU存储空间的特点进行优化。得出的实验结果表明,经过CUDA并行优化的方法能够有效地提高离散小波变换DWT的计算速度。展开更多
矩阵主特征向量(principal eigenvectors computing,PEC)的求解是科学与工程计算中的一个重要问题。随着图形处理单元通用计算(general-purpose computing on graphics pro cessing unit,GPGPU)的兴起,利用GPU来优化大规模稀疏矩阵的图...矩阵主特征向量(principal eigenvectors computing,PEC)的求解是科学与工程计算中的一个重要问题。随着图形处理单元通用计算(general-purpose computing on graphics pro cessing unit,GPGPU)的兴起,利用GPU来优化大规模稀疏矩阵的图形处理单元求解得到了广泛关注。分别从应用特征和GPU体系结构特征两方面分析了PEC运算的性能瓶颈,提出了一种面向GPU的稀疏矩阵存储格式——GPU-ELL和一个针对GPU的线程优化映射策略,并设计了相应的PEC优化执行算法。在ATI HD Radeon5850上的实验结果表明,相对于传统CPU,该方案获得了最多200倍左右的加速,相对于已有GPU上的实现,也获得了2倍的加速。展开更多
Numerical treatment of engineering application problems often eventually results in a solution of systems of linear or nonlinear equations.The solution process using digital computational devices usually takes tremend...Numerical treatment of engineering application problems often eventually results in a solution of systems of linear or nonlinear equations.The solution process using digital computational devices usually takes tremendous time due to the extremely large size encountered in most real-world engineering applications.So,practical solvers for systems of linear and nonlinear equations based on multi graphic process units(GPUs)are proposed in order to accelerate the solving process.In the linear and nonlinear solvers,the preconditioned bi-conjugate gradient stable(PBi-CGstab)method and the Inexact Newton method are used to achieve the fast and stable convergence behavior.Multi-GPUs are utilized to obtain more data storage that large size problems need.展开更多
文摘为了进一步加快JPEG2000的压缩速度,对JPEG2000压缩标准进行研究,分析得出JPEG2000核心算法离散小波变换(DWT)部分数据之间的独立性适合并行化处理。NVIDIA最新推出的CUDA(计算统一设备架构)是非常适合大规模数据并行计算的软硬件开发平台。在通用计算图形处理器(General Purpose Graphic Process Unit,GPGPU)上使用CUDA技术实现DWT并行化加速,并针对GPGPU存储空间的特点进行优化。得出的实验结果表明,经过CUDA并行优化的方法能够有效地提高离散小波变换DWT的计算速度。
文摘矩阵主特征向量(principal eigenvectors computing,PEC)的求解是科学与工程计算中的一个重要问题。随着图形处理单元通用计算(general-purpose computing on graphics pro cessing unit,GPGPU)的兴起,利用GPU来优化大规模稀疏矩阵的图形处理单元求解得到了广泛关注。分别从应用特征和GPU体系结构特征两方面分析了PEC运算的性能瓶颈,提出了一种面向GPU的稀疏矩阵存储格式——GPU-ELL和一个针对GPU的线程优化映射策略,并设计了相应的PEC优化执行算法。在ATI HD Radeon5850上的实验结果表明,相对于传统CPU,该方案获得了最多200倍左右的加速,相对于已有GPU上的实现,也获得了2倍的加速。
文摘Numerical treatment of engineering application problems often eventually results in a solution of systems of linear or nonlinear equations.The solution process using digital computational devices usually takes tremendous time due to the extremely large size encountered in most real-world engineering applications.So,practical solvers for systems of linear and nonlinear equations based on multi graphic process units(GPUs)are proposed in order to accelerate the solving process.In the linear and nonlinear solvers,the preconditioned bi-conjugate gradient stable(PBi-CGstab)method and the Inexact Newton method are used to achieve the fast and stable convergence behavior.Multi-GPUs are utilized to obtain more data storage that large size problems need.