Storm surge is often the marine disaster that poses the greatest threat to life and property in coastal areas.Accurate and timely issuance of storm surge warnings to take appropriate countermeasures is an important me...Storm surge is often the marine disaster that poses the greatest threat to life and property in coastal areas.Accurate and timely issuance of storm surge warnings to take appropriate countermeasures is an important means to reduce storm surge-related losses.Storm surge numerical models are important for storm surge forecasting.To further improve the performance of the storm surge forecast models,we developed a numerical storm surge forecast model based on an unstructured spherical centroidal Voronoi tessellation(SCVT)grid.The model is based on shallow water equations in vector-invariant form,and is discretized by Arakawa C grid.The SCVT grid can not only better describe the coastline information but also avoid rigid transitions,and it has a better global consistency by generating high-resolution grids in the key areas through transition refinement.In addition,the simulation speed of the model is accelerated by using the openACC-based GPU acceleration technology to meet the timeliness requirements of operational ensemble forecast.It only takes 37 s to simulate a day in the coastal waters of China.The newly developed storm surge model was applied to simulate typhoon-induced storm surges in the coastal waters of China.The hindcast experiments on the selected representative typhoon-induced storm surge processes indicate that the model can reasonably simulate the distribution characteristics of storm surges.The simulated maximum storm surges and their occurrence times are consistent with the observed data at the representative tide gauge stations,and the mean absolute errors are 3.5 cm and 0.6 h respectively,showing high accuracy and application prospects.展开更多
This paper presents a comprehensive exploration into the integration of Internet of Things(IoT),big data analysis,cloud computing,and Artificial Intelligence(AI),which has led to an unprecedented era of connectivity.W...This paper presents a comprehensive exploration into the integration of Internet of Things(IoT),big data analysis,cloud computing,and Artificial Intelligence(AI),which has led to an unprecedented era of connectivity.We delve into the emerging trend of machine learning on embedded devices,enabling tasks in resource-limited environ-ments.However,the widespread adoption of machine learning raises significant privacy concerns,necessitating the development of privacy-preserving techniques.One such technique,secure multi-party computation(MPC),allows collaborative computations without exposing private inputs.Despite its potential,complex protocols and communication interactions hinder performance,especially on resource-constrained devices.Efforts to enhance efficiency have been made,but scalability remains a challenge.Given the success of GPUs in deep learning,lever-aging embedded GPUs,such as those offered by NVIDIA,emerges as a promising solution.Therefore,we propose an Embedded GPU-based Secure Two-party Computation(EG-STC)framework for Artificial Intelligence(AI)systems.To the best of our knowledge,this work represents the first endeavor to fully implement machine learning model training based on secure two-party computing on the Embedded GPU platform.Our experimental results demonstrate the effectiveness of EG-STC.On an embedded GPU with a power draw of 5 W,our implementation achieved a secure two-party matrix multiplication throughput of 5881.5 kilo-operations per millisecond(kops/ms),with an energy efficiency ratio of 1176.3 kops/ms/W.Furthermore,leveraging our EG-STC framework,we achieved an overall time acceleration ratio of 5–6 times compared to solutions running on server-grade CPUs.Our solution also exhibited a reduced runtime,requiring only 60%to 70%of the runtime of previously best-known methods on the same platform.In summary,our research contributes to the advancement of secure and efficient machine learning implementations on resource-constrained embedded devices,paving the way for broader adoption of AI technologies in various applications.展开更多
针对使用中央处理器(Central Processing Unit, CPU)硬件实现密度聚类、相似性度量等算法提取船舶习惯航迹的过程中存在复杂度高、计算时间长等方面的不足,提出使用图形处理器(Graphics Processing Unit, GPU)高性能计算及GPU优化算法...针对使用中央处理器(Central Processing Unit, CPU)硬件实现密度聚类、相似性度量等算法提取船舶习惯航迹的过程中存在复杂度高、计算时间长等方面的不足,提出使用图形处理器(Graphics Processing Unit, GPU)高性能计算及GPU优化算法以提升船舶轨迹相似性度量与聚类的速度性能,大幅缩短船舶轨迹特征提取过程中的时间开销。利用长江南槽交汇水域船舶自动识别系统(Automatic Identification System, AIS)动态船舶轨迹信息进行方法验证,通过对比传统基于CPU的方法验证了所提出的基于GPU的船舶轨迹相似性度量及聚类算法存在较优的速度性能,为快速提取研究水域中的船舶特征提供新的理论依据。展开更多
The particle filter(PF) algorithm is one of the most commonly used algorithms for maneuvering target tracking. The traditional PF maps from multi-dimensional information to onedimensional information during particle...The particle filter(PF) algorithm is one of the most commonly used algorithms for maneuvering target tracking. The traditional PF maps from multi-dimensional information to onedimensional information during particle weight calculation, and the incorrect transmission of information leads to the fact that the particle prediction information does not match the weight information, and its essence is the reduction of the information entropy of the useful information. To solve this problem, a dual channel independent filtering method is proposed based on the idea of equalization mapping. Firstly, the particle prediction performance is described by particle manipulations of different dimensions, and the accuracy of particle prediction is improved. The improvement of particle degradation of this algorithm is analyzed in the aspects of particle weight and effective particle number. Secondly, according to the problem of lack of particle samples, the new particles are generated based on the filtering results, and the particle diversity is increased. Finally, the introduction of the graphics processing unit(GPU) parallel computing the platform, the “channel-level” and “particlelevel” parallel computing the program are designed to accelerate the algorithm. The simulation results show that the algorithm has the advantages of better filtering precision, higher particle efficiency and faster calculation speed compared with the traditional algorithm of the CPU platform.展开更多
We proposed an improved graphics processing unit(GPU)acceleration approach for three-dimensional structural topology optimization using the element-free Galerkin(EFG)method.This method can effectively eliminate the ra...We proposed an improved graphics processing unit(GPU)acceleration approach for three-dimensional structural topology optimization using the element-free Galerkin(EFG)method.This method can effectively eliminate the race condition under parallelization.We established a structural topology optimization model by combining the EFG method and the solid isotropic microstructures with penalization model.We explored the GPU parallel algorithm of assembling stiffness matrix,solving discrete equation,analyzing sensitivity,and updating design variables in detail.We also proposed a node pair-wise method for assembling the stiffnessmatrix and a node-wise method for sensitivity analysis to eliminate race conditions during the parallelization.Furthermore,we investigated the effects of the thread block size,the number of degrees of freedom,and the convergence error of preconditioned conjugate gradient(PCG)on GPU computing performance.Finally,the results of the three numerical examples demonstrated the validity of the proposed approach and showed the significant acceleration of structural topology optimization.To save the cost of optimization calculation,we proposed the appropriate thread block size and the convergence error of the PCG method.展开更多
Organic reefs, the targets of deep-water petro- leum exploration, developed widely in Xisha area. However, there are concealed igneous rocks undersea, to which organic rocks have nearly equal wave impedance. So the ig...Organic reefs, the targets of deep-water petro- leum exploration, developed widely in Xisha area. However, there are concealed igneous rocks undersea, to which organic rocks have nearly equal wave impedance. So the igneous rocks have become interference for future explo- ration by having similar seismic reflection characteristics. Yet, the density and magnetism of organic reefs are very different from igneous rocks. It has obvious advantages to identify organic reefs and igneous rocks by gravity and magnetic data. At first, frequency decomposition was applied to the free-air gravity anomaly in Xisha area to obtain the 2D subdivision of the gravity anomaly and magnetic anomaly in the vertical direction. Thus, the dis- tribution of igneous rocks in the horizontal direction can be acquired according to high-frequency field, low-frequency field, and its physical properties. Then, 3D forward model- ing of gravitational field was carried out to establish the density model of this area by reference to physical properties of rocks based on former researches. Furthermore, 3D inversion of gravity anomaly by genetic algorithm method of the graphic processing unit (GPU) parallel processing in Xisha target area was applied, and 3D density structure of this area was obtained. By this way, we can confine the igneous rocks to the certain depth according to the density of the igneous rocks. The frequency decomposition and 3D inversion of gravity anomaly by genetic algorithm method of the GPU parallel processing proved to be a useful method for recognizing igneous rocks to its 3D geological position. So organic reefs and igneous rocks can be identified, which provide a prescient information for further exploration.展开更多
分子动力学(MD)模拟是研究硅纳米薄膜热力学性质的主要方法,但存在数据处理量大、计算密集、原子间作用模型复杂等问题,限制了MD模拟的深入应用。针对晶硅分子动力学模拟算法中数据访问不连续和大量分支判断造成并行资源浪费、线程等待...分子动力学(MD)模拟是研究硅纳米薄膜热力学性质的主要方法,但存在数据处理量大、计算密集、原子间作用模型复杂等问题,限制了MD模拟的深入应用。针对晶硅分子动力学模拟算法中数据访问不连续和大量分支判断造成并行资源浪费、线程等待等问题,结合Nvidia Tesla V100 GPU硬件体系结构特点,对晶硅MD模拟算法进行设计。通过全局内存的合并访存、循环展开、原子操作等优化方法,利用GPU强大并行计算和浮点运算能力,减少显存访问及算法执行过程中的分支冲突和判断指令,提升算法整体计算性能。测试结果表明,优化后的晶硅MD模拟算法的计算速度相比于优化前提升了1.69~1.97倍,相比于国际上主流的GPU加速MD模拟软件HOOMDblue和LAMMPS分别提升了3.20~3.47倍和17.40~38.04倍,具有较好的模拟加速效果。展开更多
基金The National Natural Science Foundation of China under contract No.42076214.
文摘Storm surge is often the marine disaster that poses the greatest threat to life and property in coastal areas.Accurate and timely issuance of storm surge warnings to take appropriate countermeasures is an important means to reduce storm surge-related losses.Storm surge numerical models are important for storm surge forecasting.To further improve the performance of the storm surge forecast models,we developed a numerical storm surge forecast model based on an unstructured spherical centroidal Voronoi tessellation(SCVT)grid.The model is based on shallow water equations in vector-invariant form,and is discretized by Arakawa C grid.The SCVT grid can not only better describe the coastline information but also avoid rigid transitions,and it has a better global consistency by generating high-resolution grids in the key areas through transition refinement.In addition,the simulation speed of the model is accelerated by using the openACC-based GPU acceleration technology to meet the timeliness requirements of operational ensemble forecast.It only takes 37 s to simulate a day in the coastal waters of China.The newly developed storm surge model was applied to simulate typhoon-induced storm surges in the coastal waters of China.The hindcast experiments on the selected representative typhoon-induced storm surge processes indicate that the model can reasonably simulate the distribution characteristics of storm surges.The simulated maximum storm surges and their occurrence times are consistent with the observed data at the representative tide gauge stations,and the mean absolute errors are 3.5 cm and 0.6 h respectively,showing high accuracy and application prospects.
基金supported in part by Major Science and Technology Demonstration Project of Jiangsu Provincial Key R&D Program under Grant No.BE2023025in part by the National Natural Science Foundation of China under Grant No.62302238+2 种基金in part by the Natural Science Foundation of Jiangsu Province under Grant No.BK20220388in part by the Natural Science Research Project of Colleges and Universities in Jiangsu Province under Grant No.22KJB520004in part by the China Postdoctoral Science Foundation under Grant No.2022M711689.
文摘This paper presents a comprehensive exploration into the integration of Internet of Things(IoT),big data analysis,cloud computing,and Artificial Intelligence(AI),which has led to an unprecedented era of connectivity.We delve into the emerging trend of machine learning on embedded devices,enabling tasks in resource-limited environ-ments.However,the widespread adoption of machine learning raises significant privacy concerns,necessitating the development of privacy-preserving techniques.One such technique,secure multi-party computation(MPC),allows collaborative computations without exposing private inputs.Despite its potential,complex protocols and communication interactions hinder performance,especially on resource-constrained devices.Efforts to enhance efficiency have been made,but scalability remains a challenge.Given the success of GPUs in deep learning,lever-aging embedded GPUs,such as those offered by NVIDIA,emerges as a promising solution.Therefore,we propose an Embedded GPU-based Secure Two-party Computation(EG-STC)framework for Artificial Intelligence(AI)systems.To the best of our knowledge,this work represents the first endeavor to fully implement machine learning model training based on secure two-party computing on the Embedded GPU platform.Our experimental results demonstrate the effectiveness of EG-STC.On an embedded GPU with a power draw of 5 W,our implementation achieved a secure two-party matrix multiplication throughput of 5881.5 kilo-operations per millisecond(kops/ms),with an energy efficiency ratio of 1176.3 kops/ms/W.Furthermore,leveraging our EG-STC framework,we achieved an overall time acceleration ratio of 5–6 times compared to solutions running on server-grade CPUs.Our solution also exhibited a reduced runtime,requiring only 60%to 70%of the runtime of previously best-known methods on the same platform.In summary,our research contributes to the advancement of secure and efficient machine learning implementations on resource-constrained embedded devices,paving the way for broader adoption of AI technologies in various applications.
文摘针对使用中央处理器(Central Processing Unit, CPU)硬件实现密度聚类、相似性度量等算法提取船舶习惯航迹的过程中存在复杂度高、计算时间长等方面的不足,提出使用图形处理器(Graphics Processing Unit, GPU)高性能计算及GPU优化算法以提升船舶轨迹相似性度量与聚类的速度性能,大幅缩短船舶轨迹特征提取过程中的时间开销。利用长江南槽交汇水域船舶自动识别系统(Automatic Identification System, AIS)动态船舶轨迹信息进行方法验证,通过对比传统基于CPU的方法验证了所提出的基于GPU的船舶轨迹相似性度量及聚类算法存在较优的速度性能,为快速提取研究水域中的船舶特征提供新的理论依据。
基金supported by the National High-tech R&D Program of China(2015AA70560452015AA8017032P)the National Natural Science Foundation of China(61401504)
文摘The particle filter(PF) algorithm is one of the most commonly used algorithms for maneuvering target tracking. The traditional PF maps from multi-dimensional information to onedimensional information during particle weight calculation, and the incorrect transmission of information leads to the fact that the particle prediction information does not match the weight information, and its essence is the reduction of the information entropy of the useful information. To solve this problem, a dual channel independent filtering method is proposed based on the idea of equalization mapping. Firstly, the particle prediction performance is described by particle manipulations of different dimensions, and the accuracy of particle prediction is improved. The improvement of particle degradation of this algorithm is analyzed in the aspects of particle weight and effective particle number. Secondly, according to the problem of lack of particle samples, the new particles are generated based on the filtering results, and the particle diversity is increased. Finally, the introduction of the graphics processing unit(GPU) parallel computing the platform, the “channel-level” and “particlelevel” parallel computing the program are designed to accelerate the algorithm. The simulation results show that the algorithm has the advantages of better filtering precision, higher particle efficiency and faster calculation speed compared with the traditional algorithm of the CPU platform.
基金This work is supported by the National Natural Science Foundation of China(Nos.51875493,51975503,11802261)The financial support to the first author is gratefully acknowledged.
文摘We proposed an improved graphics processing unit(GPU)acceleration approach for three-dimensional structural topology optimization using the element-free Galerkin(EFG)method.This method can effectively eliminate the race condition under parallelization.We established a structural topology optimization model by combining the EFG method and the solid isotropic microstructures with penalization model.We explored the GPU parallel algorithm of assembling stiffness matrix,solving discrete equation,analyzing sensitivity,and updating design variables in detail.We also proposed a node pair-wise method for assembling the stiffnessmatrix and a node-wise method for sensitivity analysis to eliminate race conditions during the parallelization.Furthermore,we investigated the effects of the thread block size,the number of degrees of freedom,and the convergence error of preconditioned conjugate gradient(PCG)on GPU computing performance.Finally,the results of the three numerical examples demonstrated the validity of the proposed approach and showed the significant acceleration of structural topology optimization.To save the cost of optimization calculation,we proposed the appropriate thread block size and the convergence error of the PCG method.
基金financially supported by the National Natural Science Foundation of China (No.41174085)
文摘Organic reefs, the targets of deep-water petro- leum exploration, developed widely in Xisha area. However, there are concealed igneous rocks undersea, to which organic rocks have nearly equal wave impedance. So the igneous rocks have become interference for future explo- ration by having similar seismic reflection characteristics. Yet, the density and magnetism of organic reefs are very different from igneous rocks. It has obvious advantages to identify organic reefs and igneous rocks by gravity and magnetic data. At first, frequency decomposition was applied to the free-air gravity anomaly in Xisha area to obtain the 2D subdivision of the gravity anomaly and magnetic anomaly in the vertical direction. Thus, the dis- tribution of igneous rocks in the horizontal direction can be acquired according to high-frequency field, low-frequency field, and its physical properties. Then, 3D forward model- ing of gravitational field was carried out to establish the density model of this area by reference to physical properties of rocks based on former researches. Furthermore, 3D inversion of gravity anomaly by genetic algorithm method of the graphic processing unit (GPU) parallel processing in Xisha target area was applied, and 3D density structure of this area was obtained. By this way, we can confine the igneous rocks to the certain depth according to the density of the igneous rocks. The frequency decomposition and 3D inversion of gravity anomaly by genetic algorithm method of the GPU parallel processing proved to be a useful method for recognizing igneous rocks to its 3D geological position. So organic reefs and igneous rocks can be identified, which provide a prescient information for further exploration.
文摘分子动力学(MD)模拟是研究硅纳米薄膜热力学性质的主要方法,但存在数据处理量大、计算密集、原子间作用模型复杂等问题,限制了MD模拟的深入应用。针对晶硅分子动力学模拟算法中数据访问不连续和大量分支判断造成并行资源浪费、线程等待等问题,结合Nvidia Tesla V100 GPU硬件体系结构特点,对晶硅MD模拟算法进行设计。通过全局内存的合并访存、循环展开、原子操作等优化方法,利用GPU强大并行计算和浮点运算能力,减少显存访问及算法执行过程中的分支冲突和判断指令,提升算法整体计算性能。测试结果表明,优化后的晶硅MD模拟算法的计算速度相比于优化前提升了1.69~1.97倍,相比于国际上主流的GPU加速MD模拟软件HOOMDblue和LAMMPS分别提升了3.20~3.47倍和17.40~38.04倍,具有较好的模拟加速效果。