MicroMagnetic.jl is an open-source Julia package for micromagnetic and atomistic simulations.Using the features of the Julia programming language,MicroMagnetic.jl supports CPU and various GPU platforms,including NVIDI...MicroMagnetic.jl is an open-source Julia package for micromagnetic and atomistic simulations.Using the features of the Julia programming language,MicroMagnetic.jl supports CPU and various GPU platforms,including NVIDIA,AMD,Intel,and Apple GPUs.Moreover,MicroMagnetic.jl supports Monte Carlo simulations for atomistic models and implements the nudged-elastic-band method for energy barrier computations.With built-in support for double and single precision modes and a design allowing easy extensibility to add new features,MicroMagnetic.jl provides a versatile toolset for researchers in micromagnetics and atomistic simulations.展开更多
Aiming to solve the bottleneck problem of electromagnetic scattering simulation in the scenes of extremely large-scale seas and ships,a high-frequency method by using graphics processing unit(GPU)parallel acceleration...Aiming to solve the bottleneck problem of electromagnetic scattering simulation in the scenes of extremely large-scale seas and ships,a high-frequency method by using graphics processing unit(GPU)parallel acceleration technique is proposed.For the implementation of different electromagnetic methods of physical optics(PO),shooting and bouncing ray(SBR),and physical theory of diffraction(PTD),a parallel computing scheme based on the CPU-GPU parallel computing scheme is realized to balance computing tasks.Finally,a multi-GPU framework is further proposed to solve the computational difficulty caused by the massive number of ray tubes in the ray tracing process.By using the established simulation platform,signals of ships at different seas are simulated and their images are achieved as well.It is shown that the higher sea states degrade the averaged peak signal-to-noise ratio(PSNR)of radar image.展开更多
Evolutionary algorithms(EAs)have been used in high utility itemset mining(HUIM)to address the problem of discover-ing high utility itemsets(HUIs)in the exponential search space.EAs have good running and mining perform...Evolutionary algorithms(EAs)have been used in high utility itemset mining(HUIM)to address the problem of discover-ing high utility itemsets(HUIs)in the exponential search space.EAs have good running and mining performance,but they still require huge computational resource and may miss many HUIs.Due to the good combination of EA and graphics processing unit(GPU),we propose a parallel genetic algorithm(GA)based on the platform of GPU for mining HUIM(PHUI-GA).The evolution steps with improvements are performed in central processing unit(CPU)and the CPU intensive steps are sent to GPU to eva-luate with multi-threaded processors.Experiments show that the mining performance of PHUI-GA outperforms the existing EAs.When mining 90%HUIs,the PHUI-GA is up to 188 times better than the existing EAs and up to 36 times better than the CPU parallel approach.展开更多
In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularl...In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularly noteworthy in the field of image processing, which witnessed significant advancements. This parallel computing project explored the field of parallel image processing, with a focus on the grayscale conversion of colorful images. Our approach involved integrating OpenMP into our framework for parallelization to execute a critical image processing task: grayscale conversion. By using OpenMP, we strategically enhanced the overall performance of the conversion process by distributing the workload across multiple threads. The primary objectives of our project revolved around optimizing computation time and improving overall efficiency, particularly in the task of grayscale conversion of colorful images. Utilizing OpenMP for concurrent processing across multiple cores significantly reduced execution times through the effective distribution of tasks among these cores. The speedup values for various image sizes highlighted the efficacy of parallel processing, especially for large images. However, a detailed examination revealed a potential decline in parallelization efficiency with an increasing number of cores. This underscored the importance of a carefully optimized parallelization strategy, considering factors like load balancing and minimizing communication overhead. Despite challenges, the overall scalability and efficiency achieved with parallel image processing underscored OpenMP’s effectiveness in accelerating image manipulation tasks.展开更多
分子动力学(MD)模拟是研究硅纳米薄膜热力学性质的主要方法,但存在数据处理量大、计算密集、原子间作用模型复杂等问题,限制了MD模拟的深入应用。针对晶硅分子动力学模拟算法中数据访问不连续和大量分支判断造成并行资源浪费、线程等待...分子动力学(MD)模拟是研究硅纳米薄膜热力学性质的主要方法,但存在数据处理量大、计算密集、原子间作用模型复杂等问题,限制了MD模拟的深入应用。针对晶硅分子动力学模拟算法中数据访问不连续和大量分支判断造成并行资源浪费、线程等待等问题,结合Nvidia Tesla V100 GPU硬件体系结构特点,对晶硅MD模拟算法进行设计。通过全局内存的合并访存、循环展开、原子操作等优化方法,利用GPU强大并行计算和浮点运算能力,减少显存访问及算法执行过程中的分支冲突和判断指令,提升算法整体计算性能。测试结果表明,优化后的晶硅MD模拟算法的计算速度相比于优化前提升了1.69~1.97倍,相比于国际上主流的GPU加速MD模拟软件HOOMDblue和LAMMPS分别提升了3.20~3.47倍和17.40~38.04倍,具有较好的模拟加速效果。展开更多
The signal processing speed of spectral domain optical coherence tomography(SD-OCT)has become a bottleneck in a lot of medical applications.Recently,a time-domain interpolation method was proposed.This method can get ...The signal processing speed of spectral domain optical coherence tomography(SD-OCT)has become a bottleneck in a lot of medical applications.Recently,a time-domain interpolation method was proposed.This method can get better signal-to-noise ratio(SNR)but much-reduced signal processing time in SD-OCT data processing as compared with the commonly used zeropadding interpolation method.Additionally,the resampled data can be obtained by a few data and coefficients in the cutoff window.Thus,a lot of interpolations can be performed simultaneously.So,this interpolation method is suitable for parallel computing.By using graphics processing unit(GPU)and the compute unified device architecture(CUDA)program model,time-domain interpolation can be accelerated significantly.The computing capability can be achieved more than 250,000 A-lines,200,000 A-lines,and 160,000 A-lines in a second for 2,048 pixel OCT when the cutoff length is L=11,L=21,and L=31,respectively.A frame SD-OCT data(400A-lines×2,048 pixel per line)is acquired and processed on GPU in real time.The results show that signal processing time of SD-OCT can befinished in 6.223 ms when the cutoff length L=21,which is much faster than that on central processing unit(CPU).Real-time signal processing of acquired data can be realized.展开更多
基金supported by the National Key R&D Program of China(Grant No.2022YFA1403603)the Strategic Priority Research Program of Chinese Academy of Sciences(Grant No.XDB33030100)+2 种基金the National Natural Science Fund for Distinguished Young Scholar(Grant No.52325105)the National Natural Science Foundation of China(Grant Nos.12374098,11974021,and 12241406)the CAS Project for Young Scientists in Basic Research(Grant No.YSBR-084).
文摘MicroMagnetic.jl is an open-source Julia package for micromagnetic and atomistic simulations.Using the features of the Julia programming language,MicroMagnetic.jl supports CPU and various GPU platforms,including NVIDIA,AMD,Intel,and Apple GPUs.Moreover,MicroMagnetic.jl supports Monte Carlo simulations for atomistic models and implements the nudged-elastic-band method for energy barrier computations.With built-in support for double and single precision modes and a design allowing easy extensibility to add new features,MicroMagnetic.jl provides a versatile toolset for researchers in micromagnetics and atomistic simulations.
基金supported by the Opening Foundation of the Agile and Intelligence Computing Key Laboratory of Sichuan Province under Grant No.H23004the Chengdu Municipal Science and Technology Bureau Technological Innovation R&D Project(Key Project)under Grant No.2024-YF08-00106-GX.
文摘Aiming to solve the bottleneck problem of electromagnetic scattering simulation in the scenes of extremely large-scale seas and ships,a high-frequency method by using graphics processing unit(GPU)parallel acceleration technique is proposed.For the implementation of different electromagnetic methods of physical optics(PO),shooting and bouncing ray(SBR),and physical theory of diffraction(PTD),a parallel computing scheme based on the CPU-GPU parallel computing scheme is realized to balance computing tasks.Finally,a multi-GPU framework is further proposed to solve the computational difficulty caused by the massive number of ray tubes in the ray tracing process.By using the established simulation platform,signals of ships at different seas are simulated and their images are achieved as well.It is shown that the higher sea states degrade the averaged peak signal-to-noise ratio(PSNR)of radar image.
基金This work was supported by the National Natural Science Foundation of China(62073155,62002137,62106088,62206113)the High-End Foreign Expert Recruitment Plan(G2023144007L)the Fundamental Research Funds for the Central Universities(JUSRP221028).
文摘Evolutionary algorithms(EAs)have been used in high utility itemset mining(HUIM)to address the problem of discover-ing high utility itemsets(HUIs)in the exponential search space.EAs have good running and mining performance,but they still require huge computational resource and may miss many HUIs.Due to the good combination of EA and graphics processing unit(GPU),we propose a parallel genetic algorithm(GA)based on the platform of GPU for mining HUIM(PHUI-GA).The evolution steps with improvements are performed in central processing unit(CPU)and the CPU intensive steps are sent to GPU to eva-luate with multi-threaded processors.Experiments show that the mining performance of PHUI-GA outperforms the existing EAs.When mining 90%HUIs,the PHUI-GA is up to 188 times better than the existing EAs and up to 36 times better than the CPU parallel approach.
文摘In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularly noteworthy in the field of image processing, which witnessed significant advancements. This parallel computing project explored the field of parallel image processing, with a focus on the grayscale conversion of colorful images. Our approach involved integrating OpenMP into our framework for parallelization to execute a critical image processing task: grayscale conversion. By using OpenMP, we strategically enhanced the overall performance of the conversion process by distributing the workload across multiple threads. The primary objectives of our project revolved around optimizing computation time and improving overall efficiency, particularly in the task of grayscale conversion of colorful images. Utilizing OpenMP for concurrent processing across multiple cores significantly reduced execution times through the effective distribution of tasks among these cores. The speedup values for various image sizes highlighted the efficacy of parallel processing, especially for large images. However, a detailed examination revealed a potential decline in parallelization efficiency with an increasing number of cores. This underscored the importance of a carefully optimized parallelization strategy, considering factors like load balancing and minimizing communication overhead. Despite challenges, the overall scalability and efficiency achieved with parallel image processing underscored OpenMP’s effectiveness in accelerating image manipulation tasks.
文摘分子动力学(MD)模拟是研究硅纳米薄膜热力学性质的主要方法,但存在数据处理量大、计算密集、原子间作用模型复杂等问题,限制了MD模拟的深入应用。针对晶硅分子动力学模拟算法中数据访问不连续和大量分支判断造成并行资源浪费、线程等待等问题,结合Nvidia Tesla V100 GPU硬件体系结构特点,对晶硅MD模拟算法进行设计。通过全局内存的合并访存、循环展开、原子操作等优化方法,利用GPU强大并行计算和浮点运算能力,减少显存访问及算法执行过程中的分支冲突和判断指令,提升算法整体计算性能。测试结果表明,优化后的晶硅MD模拟算法的计算速度相比于优化前提升了1.69~1.97倍,相比于国际上主流的GPU加速MD模拟软件HOOMDblue和LAMMPS分别提升了3.20~3.47倍和17.40~38.04倍,具有较好的模拟加速效果。
基金supported by National High Technology R&D project of China(2008AA02Z422)The Instrument Developing Project of The Chinese Academy of Sciences,Institute of Optics and Electronic,Chinese Academy of Sciences.
文摘The signal processing speed of spectral domain optical coherence tomography(SD-OCT)has become a bottleneck in a lot of medical applications.Recently,a time-domain interpolation method was proposed.This method can get better signal-to-noise ratio(SNR)but much-reduced signal processing time in SD-OCT data processing as compared with the commonly used zeropadding interpolation method.Additionally,the resampled data can be obtained by a few data and coefficients in the cutoff window.Thus,a lot of interpolations can be performed simultaneously.So,this interpolation method is suitable for parallel computing.By using graphics processing unit(GPU)and the compute unified device architecture(CUDA)program model,time-domain interpolation can be accelerated significantly.The computing capability can be achieved more than 250,000 A-lines,200,000 A-lines,and 160,000 A-lines in a second for 2,048 pixel OCT when the cutoff length is L=11,L=21,and L=31,respectively.A frame SD-OCT data(400A-lines×2,048 pixel per line)is acquired and processed on GPU in real time.The results show that signal processing time of SD-OCT can befinished in 6.223 ms when the cutoff length L=21,which is much faster than that on central processing unit(CPU).Real-time signal processing of acquired data can be realized.