Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limi...Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limitation is particularly attractive, but is currently limited by the huge amount of calculation. In this paper, we propose a globally optimal FWI framework based on GPU parallel computing, which greatly improves the efficiency, and is expected to make globally optimal FWI more widely used. In this framework, we simplify and recombine the model parameters, and optimize the model iteratively. Each iteration contains hundreds of individuals, each individual is independent of the other, and each individual contains forward modeling and cost function calculation. The framework is suitable for a variety of globally optimal algorithms, and we test the framework with particle swarm optimization algorithm for example. Both the synthetic and field examples achieve good results, indicating the effectiveness of the framework. .展开更多
Processing large-scale 3-D gravity data is an important topic in geophysics field. Many existing inversion methods lack the competence of processing massive data and practical application capacity. This study proposes...Processing large-scale 3-D gravity data is an important topic in geophysics field. Many existing inversion methods lack the competence of processing massive data and practical application capacity. This study proposes the application of GPU parallel processing technology to the focusing inversion method, aiming at improving the inversion accuracy while speeding up calculation and reducing the memory consumption, thus obtaining the fast and reliable inversion results for large complex model. In this paper, equivalent storage of geometric trellis is used to calculate the sensitivity matrix, and the inversion is based on GPU parallel computing technology. The parallel computing program that is optimized by reducing data transfer, access restrictions and instruction restrictions as well as latency hiding greatly reduces the memory usage, speeds up the calculation, and makes the fast inversion of large models possible. By comparing and analyzing the computing speed of traditional single thread CPU method and CUDA-based GPU parallel technology, the excellent acceleration performance of GPU parallel computing is verified, which provides ideas for practical application of some theoretical inversion methods restricted by computing speed and computer memory. The model test verifies that the focusing inversion method can overcome the problem of severe skin effect and ambiguity of geological body boundary. Moreover, the increase of the model cells and inversion data can more clearly depict the boundary position of the abnormal body and delineate its specific shape.展开更多
This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstr...This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstrategy of CPU/GPU is proposed, while the hybrid parallel strategies for stiffness matrix assembly, equationsolving, sensitivity analysis, and design variable update are discussed in detail. To ensure the high efficiency ofCPU/GPU computing, a workload balancing strategy is presented for optimally distributing the workload betweenCPU and GPU. To illustrate the advantages of the proposedmethod, three benchmark examples are tested to verifythe hybrid parallel strategy in this paper. The results show that the efficiency of the hybrid method is faster thanserial CPU and parallel GPU, while the speedups can be up to two orders of magnitude.展开更多
针对具有物理机制的分布式水文模型对大流域、长序列模拟计算时间长、模拟速度慢的问题,引入基于GPU的并行计算技术,实现分布式水文模型WEP-L(water and energy transfer processes in large river basins)产流过程的并行化。选择鄱阳...针对具有物理机制的分布式水文模型对大流域、长序列模拟计算时间长、模拟速度慢的问题,引入基于GPU的并行计算技术,实现分布式水文模型WEP-L(water and energy transfer processes in large river basins)产流过程的并行化。选择鄱阳湖流域为实验区,采用计算能力为8.6的NVIDIA RTX A4000对算法性能进行测试。研究表明:提出的基于GPU的分布式水文模型并行算法具有良好的加速效果,当线程总数越接近划分的子流域个数(计算任务量)时,并行性能越好,在实验流域WEP-L模型子流域单元为8712个时,加速比最大达到2.5左右;随着计算任务量的增加,加速比逐渐增大,当实验流域WEP-L模型子流域单元增加到24897个时,加速比能达到3.5,表明GPU并行算法在大尺度流域分布式水文模型计算中具有良好的发展潜力。展开更多
【应用背景】快速射电暴(Fast Radio Burst,FRB)搜寻是500米口径球面射电望远镜(FAST)的重要科学目标之一,其计算复杂度高,数据量大,当前算法GPU利用率偏低,数据处理需较多的人工介入操作。【目的】在不修改算法实现的前提下,实现进程级...【应用背景】快速射电暴(Fast Radio Burst,FRB)搜寻是500米口径球面射电望远镜(FAST)的重要科学目标之一,其计算复杂度高,数据量大,当前算法GPU利用率偏低,数据处理需较多的人工介入操作。【目的】在不修改算法实现的前提下,实现进程级GPU并行优化,提高GPU整体资源利用率,简化算法运行调度,支持利用自动化脚本驱动计算过程。【方法】利用容器化封装FRB搜寻算法,结合GPU聚合技术实现多个FRB搜寻计算容器的多进程并行,支持GPU闲时复用。通过容器化封装屏蔽了GPU调用、依赖库管理等技术细节,减少人工介入操作。【结果】算法实验结果表明,在不修改原始算法、不增加GPU资源的前提下,将单GPU绑定6个计算进程,并行优化可实现FRB搜寻算法的加速比达到5.3,并行效率达到0.88,取得良好的并行效果。【结论】基于容器化封装及进程级GPU聚合的并行优化,可实现GPU利用率及计算效率的提升,有效支持自动化处理。该方法还具有良好的通用性,可适用于类似应用的并行优化。展开更多
In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularl...In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularly noteworthy in the field of image processing, which witnessed significant advancements. This parallel computing project explored the field of parallel image processing, with a focus on the grayscale conversion of colorful images. Our approach involved integrating OpenMP into our framework for parallelization to execute a critical image processing task: grayscale conversion. By using OpenMP, we strategically enhanced the overall performance of the conversion process by distributing the workload across multiple threads. The primary objectives of our project revolved around optimizing computation time and improving overall efficiency, particularly in the task of grayscale conversion of colorful images. Utilizing OpenMP for concurrent processing across multiple cores significantly reduced execution times through the effective distribution of tasks among these cores. The speedup values for various image sizes highlighted the efficacy of parallel processing, especially for large images. However, a detailed examination revealed a potential decline in parallelization efficiency with an increasing number of cores. This underscored the importance of a carefully optimized parallelization strategy, considering factors like load balancing and minimizing communication overhead. Despite challenges, the overall scalability and efficiency achieved with parallel image processing underscored OpenMP’s effectiveness in accelerating image manipulation tasks.展开更多
针对深度学习图像分类场景中多GPU并行后传输效率低的问题,提出一种低时间复杂度的Ring All Reduce改进算法。通过分节点间隔配对原则优化数据传输流程,缓解传统参数服务器并行结构的带宽损耗。基于数据并行难以支撑大规模网络参数及加...针对深度学习图像分类场景中多GPU并行后传输效率低的问题,提出一种低时间复杂度的Ring All Reduce改进算法。通过分节点间隔配对原则优化数据传输流程,缓解传统参数服务器并行结构的带宽损耗。基于数据并行难以支撑大规模网络参数及加速延缓的问题,根据深度学习主干网络所包含的权重参数低于全连接层权重参数、同步开销小、全连接层权重大与梯度传输开销过高等特点,提出GPU混合并行优化算法,将主干网络进行数据并行,全连接层进行模型并行,并通过改进的Ring All Reduce算法实现各节点之间的并行后数据通信,用于基于深度学习模型的图像分类。在Cifar10和mini ImageNet两个公共数据集上的实验结果表明,该算法在保持分类精度不变的情况下可以获得更好的加速效果,相比数据并行方法,可达到近45%的提升效果。展开更多
文摘Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limitation is particularly attractive, but is currently limited by the huge amount of calculation. In this paper, we propose a globally optimal FWI framework based on GPU parallel computing, which greatly improves the efficiency, and is expected to make globally optimal FWI more widely used. In this framework, we simplify and recombine the model parameters, and optimize the model iteratively. Each iteration contains hundreds of individuals, each individual is independent of the other, and each individual contains forward modeling and cost function calculation. The framework is suitable for a variety of globally optimal algorithms, and we test the framework with particle swarm optimization algorithm for example. Both the synthetic and field examples achieve good results, indicating the effectiveness of the framework. .
基金Supported by Project of National Natural Science Foundation(No.41874134)
文摘Processing large-scale 3-D gravity data is an important topic in geophysics field. Many existing inversion methods lack the competence of processing massive data and practical application capacity. This study proposes the application of GPU parallel processing technology to the focusing inversion method, aiming at improving the inversion accuracy while speeding up calculation and reducing the memory consumption, thus obtaining the fast and reliable inversion results for large complex model. In this paper, equivalent storage of geometric trellis is used to calculate the sensitivity matrix, and the inversion is based on GPU parallel computing technology. The parallel computing program that is optimized by reducing data transfer, access restrictions and instruction restrictions as well as latency hiding greatly reduces the memory usage, speeds up the calculation, and makes the fast inversion of large models possible. By comparing and analyzing the computing speed of traditional single thread CPU method and CUDA-based GPU parallel technology, the excellent acceleration performance of GPU parallel computing is verified, which provides ideas for practical application of some theoretical inversion methods restricted by computing speed and computer memory. The model test verifies that the focusing inversion method can overcome the problem of severe skin effect and ambiguity of geological body boundary. Moreover, the increase of the model cells and inversion data can more clearly depict the boundary position of the abnormal body and delineate its specific shape.
基金the National Key R&D Program of China(2020YFB1708300)the National Natural Science Foundation of China(52005192)the Project of Ministry of Industry and Information Technology(TC210804R-3).
文摘This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstrategy of CPU/GPU is proposed, while the hybrid parallel strategies for stiffness matrix assembly, equationsolving, sensitivity analysis, and design variable update are discussed in detail. To ensure the high efficiency ofCPU/GPU computing, a workload balancing strategy is presented for optimally distributing the workload betweenCPU and GPU. To illustrate the advantages of the proposedmethod, three benchmark examples are tested to verifythe hybrid parallel strategy in this paper. The results show that the efficiency of the hybrid method is faster thanserial CPU and parallel GPU, while the speedups can be up to two orders of magnitude.
文摘针对具有物理机制的分布式水文模型对大流域、长序列模拟计算时间长、模拟速度慢的问题,引入基于GPU的并行计算技术,实现分布式水文模型WEP-L(water and energy transfer processes in large river basins)产流过程的并行化。选择鄱阳湖流域为实验区,采用计算能力为8.6的NVIDIA RTX A4000对算法性能进行测试。研究表明:提出的基于GPU的分布式水文模型并行算法具有良好的加速效果,当线程总数越接近划分的子流域个数(计算任务量)时,并行性能越好,在实验流域WEP-L模型子流域单元为8712个时,加速比最大达到2.5左右;随着计算任务量的增加,加速比逐渐增大,当实验流域WEP-L模型子流域单元增加到24897个时,加速比能达到3.5,表明GPU并行算法在大尺度流域分布式水文模型计算中具有良好的发展潜力。
文摘【应用背景】快速射电暴(Fast Radio Burst,FRB)搜寻是500米口径球面射电望远镜(FAST)的重要科学目标之一,其计算复杂度高,数据量大,当前算法GPU利用率偏低,数据处理需较多的人工介入操作。【目的】在不修改算法实现的前提下,实现进程级GPU并行优化,提高GPU整体资源利用率,简化算法运行调度,支持利用自动化脚本驱动计算过程。【方法】利用容器化封装FRB搜寻算法,结合GPU聚合技术实现多个FRB搜寻计算容器的多进程并行,支持GPU闲时复用。通过容器化封装屏蔽了GPU调用、依赖库管理等技术细节,减少人工介入操作。【结果】算法实验结果表明,在不修改原始算法、不增加GPU资源的前提下,将单GPU绑定6个计算进程,并行优化可实现FRB搜寻算法的加速比达到5.3,并行效率达到0.88,取得良好的并行效果。【结论】基于容器化封装及进程级GPU聚合的并行优化,可实现GPU利用率及计算效率的提升,有效支持自动化处理。该方法还具有良好的通用性,可适用于类似应用的并行优化。
文摘In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularly noteworthy in the field of image processing, which witnessed significant advancements. This parallel computing project explored the field of parallel image processing, with a focus on the grayscale conversion of colorful images. Our approach involved integrating OpenMP into our framework for parallelization to execute a critical image processing task: grayscale conversion. By using OpenMP, we strategically enhanced the overall performance of the conversion process by distributing the workload across multiple threads. The primary objectives of our project revolved around optimizing computation time and improving overall efficiency, particularly in the task of grayscale conversion of colorful images. Utilizing OpenMP for concurrent processing across multiple cores significantly reduced execution times through the effective distribution of tasks among these cores. The speedup values for various image sizes highlighted the efficacy of parallel processing, especially for large images. However, a detailed examination revealed a potential decline in parallelization efficiency with an increasing number of cores. This underscored the importance of a carefully optimized parallelization strategy, considering factors like load balancing and minimizing communication overhead. Despite challenges, the overall scalability and efficiency achieved with parallel image processing underscored OpenMP’s effectiveness in accelerating image manipulation tasks.
文摘针对深度学习图像分类场景中多GPU并行后传输效率低的问题,提出一种低时间复杂度的Ring All Reduce改进算法。通过分节点间隔配对原则优化数据传输流程,缓解传统参数服务器并行结构的带宽损耗。基于数据并行难以支撑大规模网络参数及加速延缓的问题,根据深度学习主干网络所包含的权重参数低于全连接层权重参数、同步开销小、全连接层权重大与梯度传输开销过高等特点,提出GPU混合并行优化算法,将主干网络进行数据并行,全连接层进行模型并行,并通过改进的Ring All Reduce算法实现各节点之间的并行后数据通信,用于基于深度学习模型的图像分类。在Cifar10和mini ImageNet两个公共数据集上的实验结果表明,该算法在保持分类精度不变的情况下可以获得更好的加速效果,相比数据并行方法,可达到近45%的提升效果。
基金Supported by the National Science and Technology Major Project of the Ministry of Science and Technology of China (No. 2013ZX06002001- 007), the National Key Scientific Instrument and Equipment Development Projects, China (No. 2012YQ180118) and the National Natural Science Foundation of China (Nos. 11275110, 11075091 and 11105081).