The growing development of the Internet of Things(IoT)is accelerating the emergence and growth of new IoT services and applications,which will result in massive amounts of data being generated,transmitted and pro-cess...The growing development of the Internet of Things(IoT)is accelerating the emergence and growth of new IoT services and applications,which will result in massive amounts of data being generated,transmitted and pro-cessed in wireless communication networks.Mobile Edge Computing(MEC)is a desired paradigm to timely process the data from IoT for value maximization.In MEC,a number of computing-capable devices are deployed at the network edge near data sources to support edge computing,such that the long network transmission delay in cloud computing paradigm could be avoided.Since an edge device might not always have sufficient resources to process the massive amount of data,computation offloading is significantly important considering the coop-eration among edge devices.However,the dynamic traffic characteristics and heterogeneous computing capa-bilities of edge devices challenge the offloading.In addition,different scheduling schemes might provide different computation delays to the offloaded tasks.Thus,offloading in mobile nodes and scheduling in the MEC server are coupled to determine service delay.This paper seeks to guarantee low delay for computation intensive applica-tions by jointly optimizing the offloading and scheduling in such an MEC system.We propose a Delay-Greedy Computation Offloading(DGCO)algorithm to make offloading decisions for new tasks in distributed computing-enabled mobile devices.A Reinforcement Learning-based Parallel Scheduling(RLPS)algorithm is further designed to schedule offloaded tasks in the multi-core MEC server.With an offloading delay broadcast mechanism,the DGCO and RLPS cooperate to achieve the goal of delay-guarantee-ratio maximization.Finally,the simulation results show that our proposal can bound the end-to-end delay of various tasks.Even under slightly heavy task load,the delay-guarantee-ratio given by DGCO-RLPS can still approximate 95%,while that given by benchmarked algorithms is reduced to intolerable value.The simulation results are demonstrated the effective-ness of DGCO-RLPS for delay guarantee in MEC.展开更多
The fingerprinting-based approach using the wireless local area network(WLAN)is widely used for indoor localization.However,the construction of the fingerprint database is quite time-consuming.Especially when the posi...The fingerprinting-based approach using the wireless local area network(WLAN)is widely used for indoor localization.However,the construction of the fingerprint database is quite time-consuming.Especially when the position of the access point(AP)or wall changes,updating the fingerprint database in real-time is difficult.An appropriate indoor localization approach,which has a low implementation cost,excellent real-time performance,and high localization accuracy and fully considers complex indoor environment factors,is preferred in location-based services(LBSs)applications.In this paper,we proposed a fine-grained grid computing(FGGC)model to achieve decimeter-level localization accuracy.Reference points(RPs)are generated in the grid by the FGGC model.Then,the received signal strength(RSS)values at each RP are calculated with the attenuation factors,such as the frequency band,three-dimensional propagation distance,and walls in complex environments.As a result,the fingerprint database can be established automatically without manual measurement,and the efficiency and cost that the FGGC model takes for the fingerprint database are superior to previous methods.The proposed indoor localization approach,which estimates the position step by step from the approximate grid location to the fine-grained location,can achieve higher real-time performance and localization accuracy simultaneously.The mean error of the proposed model is 0.36 m,far lower than that of previous approaches.Thus,the proposed model is feasible to improve the efficiency and accuracy of Wi-Fi indoor localization.It also shows high-accuracy performance with a fast running speed even under a large-size grid.The results indicate that the proposed method can also be suitable for precise marketing,indoor navigation,and emergency rescue.展开更多
This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstr...This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstrategy of CPU/GPU is proposed, while the hybrid parallel strategies for stiffness matrix assembly, equationsolving, sensitivity analysis, and design variable update are discussed in detail. To ensure the high efficiency ofCPU/GPU computing, a workload balancing strategy is presented for optimally distributing the workload betweenCPU and GPU. To illustrate the advantages of the proposedmethod, three benchmark examples are tested to verifythe hybrid parallel strategy in this paper. The results show that the efficiency of the hybrid method is faster thanserial CPU and parallel GPU, while the speedups can be up to two orders of magnitude.展开更多
This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from g...This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from graphic-centric processors to versatile computing units,it delves into the nuanced optimization of memory access,thread management,algorithmic design,and data structures.These optimizations are critical for exploiting the parallel processing capabilities of GPUs,addressingboth the theoretical frameworks and practical implementations.By integrating advanced strategies such as memory coalescing,dynamic scheduling,and parallel algorithmic transformations,this research aims to significantly elevate computational efficiency and throughput.The findings underscore the potential of optimized GPU programming to revolutionize computational tasks across various domains,highlighting a pathway towards achieving unparalleled processing power and efficiency in HPC environments.The paper not only contributes to the academic discourse on GPU optimization but also provides actionable insights for developers,fostering advancements in computational sciences and technology.展开更多
To efficiently complete a complex computation task,the complex task should be decomposed into subcomputation tasks that run parallel in edge computing.Wireless Sensor Network(WSN)is a typical application of parallel c...To efficiently complete a complex computation task,the complex task should be decomposed into subcomputation tasks that run parallel in edge computing.Wireless Sensor Network(WSN)is a typical application of parallel computation.To achieve highly reliable parallel computation for wireless sensor network,the network's lifetime needs to be extended.Therefore,a proper task allocation strategy is needed to reduce the energy consumption and balance the load of the network.This paper proposes a task model and a cluster-based WSN model in edge computing.In our model,different tasks require different types of resources and different sensors provide different types of resources,so our model is heterogeneous,which makes the model more practical.Then we propose a task allocation algorithm that combines the Genetic Algorithm(GA)and the Ant Colony Optimization(ACO)algorithm.The algorithm concentrates on energy conservation and load balancing so that the lifetime of the network can be extended.The experimental result shows the algorithm's effectiveness and advantages in energy conservation and load balancing.展开更多
A comparison between deep learning and standalone models in predicting the compaction parameters of soil is presented in this research.One hundred and ninety and fifty-three soil samples were randomly picked up from t...A comparison between deep learning and standalone models in predicting the compaction parameters of soil is presented in this research.One hundred and ninety and fifty-three soil samples were randomly picked up from two hundred and forty-three soil samples to create training and validation datasets,respectively.The performance and accuracy of the models were measured by root mean square error(RMSE),coefficient of determination(R2),Pearson product-moment correlation coefficient(r),mean absolute error(MAE),variance accounted for(VAF),mean absolute percentage error(MAPE),weighted mean absolute percentage error(WMAPE),a20-index,index of scatter(IOS),and index of agreement(IOA).Comparisons between standalone models demonstrate that the model MD 29 in Gaussian process regression(GPR)and model MD 101 in support vector machine(SVM)can achieve over 96%of accuracy in predicting the optimum moisture content(OMC)and maximum dry density(MDD)of soil,and outperformed other standalone models.The comparison between deep learning models shows that the models MD 46 and MD 146 in long short-term memory(LSTM)predict OMC and MDD with higher accuracy than ANN models.However,the LSTM models outperformed the GPR models in predicting the compaction parameters.The sensitivity analysis illustrates that fine content(FC),specific gravity(SG),and liquid limit(LL)highly influence the prediction of compaction parameters.展开更多
Accurate 3-dimensional(3-D)reconstruction technology for nondestructive testing based on digital radiography(DR)is of great importance for alleviating the drawbacks of the existing computed tomography(CT)-based method...Accurate 3-dimensional(3-D)reconstruction technology for nondestructive testing based on digital radiography(DR)is of great importance for alleviating the drawbacks of the existing computed tomography(CT)-based method.The commonly used Monte Carlo simulation method ensures well-performing imaging results for DR.However,for 3-D reconstruction,it is limited by its high time consumption.To solve this problem,this study proposes a parallel computing method to accelerate Monte Carlo simulation for projection images with a parallel interface and a specific DR application.The images are utilized for 3-D reconstruction of the test model.We verify the accuracy of parallel computing for DR and evaluate the performance of two parallel computing modes-multithreaded applications(G4-MT)and message-passing interfaces(G4-MPI)-by assessing parallel speedup and efficiency.This study explores the scalability of the hybrid G4-MPI and G4-MT modes.The results show that the two parallel computing modes can significantly reduce the Monte Carlo simulation time because the parallel speedup increment of Monte Carlo simulations can be considered linear growth,and the parallel efficiency is maintained at a high level.The hybrid mode has strong scalability,as the overall run time of the 180 simulations using 320 threads is 15.35 h with 10 billion particles emitted,and the parallel speedup can be up to 151.36.The 3-D reconstruction of the model is achieved based on the filtered back projection(FBP)algorithm using 180 projection images obtained with the hybrid G4-MPI and G4-MT.The quality of the reconstructed sliced images is satisfactory because the images can reflect the internal structure of the test model.This method is applied to a complex model,and the quality of the reconstructed images is evaluated.展开更多
The discrete fracture network model is a powerful tool for fractured rock mass fluid flow simulations and supports safety assessments of coal mine hazards such as water inrush.Intersection analysis,which identifies al...The discrete fracture network model is a powerful tool for fractured rock mass fluid flow simulations and supports safety assessments of coal mine hazards such as water inrush.Intersection analysis,which identifies all pairs of intersected fractures(the basic components composing the connectivity of a network),is one of its crucial procedures.This paper attempts to improve intersection analysis through parallel computing.Considering a seamless interfacing with other procedures in modeling,two algorithms are designed and presented,of which one is a completely independent parallel procedure with some redundant computations and the other is an optimized version with reduced redundancy.A numerical study indicates that both of the algorithms are practical and can significantly improve the computational performance of intersection analysis for large-scale simulations.Moreover,the preferred application conditions for the two algorithms are also discussed.展开更多
Core,thin section,conventional and image logs are used to provide insights into distribution of fractures in fine grained sedimentary rocks of Permian Lucaogou Formation in Jimusar Sag.Bedding parallel fractures are c...Core,thin section,conventional and image logs are used to provide insights into distribution of fractures in fine grained sedimentary rocks of Permian Lucaogou Formation in Jimusar Sag.Bedding parallel fractures are common in fine grained sedimentary rocks which are characterized by layered structures.Core and thin section analysis reveal that fractures in Lucaogou Formation include tectonic inclined fracture,bedding parallel fracture,and abnormal high pressure fracture.Bedding parallel fractures are abundant,but only minor amounts of them remain open,and most of them are partly to fully sealed by carbonate minerals(calcite)and bitumen.Bedding parallel fractures result in a rapid decrease in resistivity,and they are recognized on image logs to extend along bedding planes and have discontinuous surfaces due to partly-fully filled resistive carbonate minerals as well as late stage dissolution.A comprehensive interpretation of distribution of bedding parallel fractures is performed with green line,red line,yellow line and blue line representing bedding planes,induced fractures,resistive fractures,and open(bedding and inclined)fractures,respectively.The strike of bedding parallel fractures is coinciding with bedding planes.Bedding parallel fractures are closely associated with the amounts of bedding planes,and high density of bedding planes favor the formation of bedding parallel fractures.Alternating dark and bright layers have the most abundant bedding parallel fractures on the image logs,and the bedding parallel fractures are always associated with low resistivity zones.The results above may help optimize sweet spots in fine grained sedimentary rocks,and improve future fracturing design and optimize well spacing.展开更多
A computational fluid dynamics(CFD)solver for a GPU/CPU heterogeneous architecture parallel computing platform is developed to simulate incompressible flows on billion-level grid points.To solve the Poisson equation,t...A computational fluid dynamics(CFD)solver for a GPU/CPU heterogeneous architecture parallel computing platform is developed to simulate incompressible flows on billion-level grid points.To solve the Poisson equation,the conjugate gradient method is used as a basic solver,and a Chebyshev method in combination with a Jacobi sub-preconditioner is used as a preconditioner.The developed CFD solver shows good performance on parallel efficiency,which exceeds 90%in the weak-scalability test when the number of grid points allocated to each GPU card is greater than 2083.In the acceleration test,it is found that running a simulation with 10403 grid points on 125 GPU cards accelerates by 203.6x over the same number of CPU cores.The developed solver is then tested in the context of a two-dimensional lid-driven cavity flow and three-dimensional Taylor-Green vortex flow.The results are consistent with previous results in the literature.展开更多
The construction of new power systems presents higher requirements for the Power Internet of Things(PIoT)technology.The“source-grid-load-storage”architecture of a new power system requires PIoT to have a stronger mu...The construction of new power systems presents higher requirements for the Power Internet of Things(PIoT)technology.The“source-grid-load-storage”architecture of a new power system requires PIoT to have a stronger multi-source heterogeneous data fusion ability.Native graph databases have great advantages in dealing with multi-source heterogeneous data,which make them suitable for an increasing number of analytical computing tasks.However,only few existing graph database products have native support for matrix operation-related interfaces or functions,resulting in low efficiency when handling matrix calculations that are commonly encountered in power grids.In this paper,the matrix computation process is expressed by a strategy called graph description,which relies on the natural connection between the matrix and structure of the graph.Based on that,we implement matrix operations on graph database,including matrix multiplication,matrix decomposition,etc.Specifically,only the nodes relevant to the computation and their neighbors are concerned in the process,which prunes the influence of zero elements in the matrix and avoids useless iterations compared to the conventional matrix computation.Based on the graph description,a series of power grid computations can be implemented on graph database,which reduces redundant data import and export operations while leveraging the parallel computing capability of graph database.It promotes the efficiency of PIoT when handling multi-source heterogeneous data.An comprehensive experimental study over two different scale power system datasets compares the proposed method with Python and MATLAB baselines.The results reveal the superior performance of our proposed method in both power flow and N-1 contingency computations.展开更多
Due to the inherent insecure nature of the Internet,it is crucial to ensure the secure transmission of image data over this network.Additionally,given the limitations of computers,it becomes evenmore important to empl...Due to the inherent insecure nature of the Internet,it is crucial to ensure the secure transmission of image data over this network.Additionally,given the limitations of computers,it becomes evenmore important to employ efficient and fast image encryption techniques.While 1D chaotic maps offer a practical approach to real-time image encryption,their limited flexibility and increased vulnerability restrict their practical application.In this research,we have utilized a 3DHindmarsh-Rosemodel to construct a secure cryptosystem.The randomness of the chaotic map is assessed through standard analysis.The proposed system enhances security by incorporating an increased number of system parameters and a wide range of chaotic parameters,as well as ensuring a uniformdistribution of chaotic signals across the entire value space.Additionally,a fast image encryption technique utilizing the new chaotic system is proposed.The novelty of the approach is confirmed through time complexity analysis.To further strengthen the resistance against cryptanalysis attacks and differential attacks,the SHA-256 algorithm is employed for secure key generation.Experimental results through a number of parameters demonstrate the strong cryptographic performance of the proposed image encryption approach,highlighting its exceptional suitability for secure communication.Moreover,the security of the proposed scheme has been compared with stateof-the-art image encryption schemes,and all comparison metrics indicate the superior performance of the proposed scheme.展开更多
Derived from a proposed universal mathematical expression, this paper investigates a novel algo-rithm for parallel Cyclic Redundancy Check (CRC) computation, which is an iterative algorithm to update the check-bit seq...Derived from a proposed universal mathematical expression, this paper investigates a novel algo-rithm for parallel Cyclic Redundancy Check (CRC) computation, which is an iterative algorithm to update the check-bit sequence step by step and suits to various argument selections of CRC computation. The algorithm proposed is quite suitable for hardware implementation. The simulation implementation and performance analysis suggest that it could efficiently speed up the computation compared with the conventional ones. The algorithm is implemented in hardware at as high as 21Gbps, and its usefulness in high-speed CRC computa-tions is implied, such as Asynchronous Transfer Mode (ATM) networks and 10G Ethernet.展开更多
In this work, we treat scattering objects, water, surface and bottom in a truly unified manner in a parallel finitedifference time-domain (FDTD) scheme, which is suitable for distributed parallel computing in a mess...In this work, we treat scattering objects, water, surface and bottom in a truly unified manner in a parallel finitedifference time-domain (FDTD) scheme, which is suitable for distributed parallel computing in a message passing interface (MPI) programming environment. The algorithm is implemented on a cluster-based high performance computer system. Parallel computation is performed with different division methods in 2D and 3D situations. Based on analysis of main factors affecting the speedup rate and parallel efficiency, data communication is reduced by selecting a suitable scheme of task division. A desirable scheme is recommended, giving a higher speedup rate and better efficiency. The results indicate that the unified parallel FDTD algorithm provides a solution to the numerical computation of acoustic scattering.展开更多
Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limi...Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limitation is particularly attractive, but is currently limited by the huge amount of calculation. In this paper, we propose a globally optimal FWI framework based on GPU parallel computing, which greatly improves the efficiency, and is expected to make globally optimal FWI more widely used. In this framework, we simplify and recombine the model parameters, and optimize the model iteratively. Each iteration contains hundreds of individuals, each individual is independent of the other, and each individual contains forward modeling and cost function calculation. The framework is suitable for a variety of globally optimal algorithms, and we test the framework with particle swarm optimization algorithm for example. Both the synthetic and field examples achieve good results, indicating the effectiveness of the framework. .展开更多
The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memorysystems.However, MPI implementations can contain defects that impact the reliability and performance of par...The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memorysystems.However, MPI implementations can contain defects that impact the reliability and performance of parallelapplications. Detecting and correcting these defects is crucial, yet there is a lack of published models specificallydesigned for correctingMPI defects. To address this, we propose a model for detecting and correcting MPI defects(DC_MPI), which aims to detect and correct defects in various types of MPI communication, including blockingpoint-to-point (BPTP), nonblocking point-to-point (NBPTP), and collective communication (CC). The defectsaddressed by the DC_MPI model include illegal MPI calls, deadlocks (DL), race conditions (RC), and messagemismatches (MM). To assess the effectiveness of the DC_MPI model, we performed experiments on a datasetconsisting of 40 MPI codes. The results indicate that the model achieved a detection rate of 37 out of 40 codes,resulting in an overall detection accuracy of 92.5%. Additionally, the execution duration of the DC_MPI modelranged from 0.81 to 1.36 s. These findings show that the DC_MPI model is useful in detecting and correctingdefects in MPI implementations, thereby enhancing the reliability and performance of parallel applications. TheDC_MPImodel fills an important research gap and provides a valuable tool for improving the quality ofMPI-basedparallel computing systems.展开更多
In this research,we present the pure open multi-processing(OpenMP),pure message passing interface(MPI),and hybrid MPI/OpenMP parallel solvers within the dynamic explicit central difference algorithm for the coining pr...In this research,we present the pure open multi-processing(OpenMP),pure message passing interface(MPI),and hybrid MPI/OpenMP parallel solvers within the dynamic explicit central difference algorithm for the coining process to address the challenge of capturing fine relief features of approximately 50 microns.Achieving such precision demands the utilization of at least 7 million tetrahedron elements,surpassing the capabilities of traditional serial programs previously developed.To mitigate data races when calculating internal forces,intermediate arrays are introduced within the OpenMP directive.This helps ensure proper synchronization and avoid conflicts during parallel execution.Additionally,in the MPI implementation,the coins are partitioned into the desired number of regions.This division allows for efficient distribution of computational tasks across multiple processes.Numerical simulation examples are conducted to compare the three solvers with serial programs,evaluating correctness,acceleration ratio,and parallel efficiency.The results reveal a relative error of approximately 0.3%in forming force among the parallel and serial solvers,while the predicted insufficient material zones align with experimental observations.Additionally,speedup ratio and parallel efficiency are assessed for the coining process simulation.The pureMPI parallel solver achieves a maximum acceleration of 9.5 on a single computer(utilizing 12 cores)and the hybrid solver exhibits a speedup ratio of 136 in a cluster(using 6 compute nodes and 12 cores per compute node),showing the strong scalability of the hybrid MPI/OpenMP programming model.This approach effectively meets the simulation requirements for commemorative coins with intricate relief patterns.展开更多
A method of the parallel computation of the linear quadratic non cooperative dynamic games problem is proposed. The Lyapunov function is introduced, through which the form adapted to parallel computation of the open ...A method of the parallel computation of the linear quadratic non cooperative dynamic games problem is proposed. The Lyapunov function is introduced, through which the form adapted to parallel computation of the open loop Nash equilibrium strategies is gi展开更多
In this paper, based on the implicit Runge-Kutta(IRK) methods, we derive a class of parallel scheme that can be implemented on the parallel computers with Ns(N is a positive even number) processors efficiently, and di...In this paper, based on the implicit Runge-Kutta(IRK) methods, we derive a class of parallel scheme that can be implemented on the parallel computers with Ns(N is a positive even number) processors efficiently, and discuss the iteratively B-convergence of the Newton iterative process for solving the algebraic equations of the scheme, secondly we present a strategy providing initial values parallelly for the iterative process. Finally, some numerical results show that our parallel scheme is higher efficient as N is not so large.展开更多
基金supported in part by the National Natural Science Foundation of China under Grant 61901128,62273109the Natural Science Foundation of the Jiangsu Higher Education Institutions of China(21KJB510032).
文摘The growing development of the Internet of Things(IoT)is accelerating the emergence and growth of new IoT services and applications,which will result in massive amounts of data being generated,transmitted and pro-cessed in wireless communication networks.Mobile Edge Computing(MEC)is a desired paradigm to timely process the data from IoT for value maximization.In MEC,a number of computing-capable devices are deployed at the network edge near data sources to support edge computing,such that the long network transmission delay in cloud computing paradigm could be avoided.Since an edge device might not always have sufficient resources to process the massive amount of data,computation offloading is significantly important considering the coop-eration among edge devices.However,the dynamic traffic characteristics and heterogeneous computing capa-bilities of edge devices challenge the offloading.In addition,different scheduling schemes might provide different computation delays to the offloaded tasks.Thus,offloading in mobile nodes and scheduling in the MEC server are coupled to determine service delay.This paper seeks to guarantee low delay for computation intensive applica-tions by jointly optimizing the offloading and scheduling in such an MEC system.We propose a Delay-Greedy Computation Offloading(DGCO)algorithm to make offloading decisions for new tasks in distributed computing-enabled mobile devices.A Reinforcement Learning-based Parallel Scheduling(RLPS)algorithm is further designed to schedule offloaded tasks in the multi-core MEC server.With an offloading delay broadcast mechanism,the DGCO and RLPS cooperate to achieve the goal of delay-guarantee-ratio maximization.Finally,the simulation results show that our proposal can bound the end-to-end delay of various tasks.Even under slightly heavy task load,the delay-guarantee-ratio given by DGCO-RLPS can still approximate 95%,while that given by benchmarked algorithms is reduced to intolerable value.The simulation results are demonstrated the effective-ness of DGCO-RLPS for delay guarantee in MEC.
基金the Open Project of Sichuan Provincial Key Laboratory of Philosophy and Social Science for Language Intelligence in Special Education under Grant No.YYZN-2023-4the Ph.D.Fund of Chengdu Technological University under Grant No.2020RC002.
文摘The fingerprinting-based approach using the wireless local area network(WLAN)is widely used for indoor localization.However,the construction of the fingerprint database is quite time-consuming.Especially when the position of the access point(AP)or wall changes,updating the fingerprint database in real-time is difficult.An appropriate indoor localization approach,which has a low implementation cost,excellent real-time performance,and high localization accuracy and fully considers complex indoor environment factors,is preferred in location-based services(LBSs)applications.In this paper,we proposed a fine-grained grid computing(FGGC)model to achieve decimeter-level localization accuracy.Reference points(RPs)are generated in the grid by the FGGC model.Then,the received signal strength(RSS)values at each RP are calculated with the attenuation factors,such as the frequency band,three-dimensional propagation distance,and walls in complex environments.As a result,the fingerprint database can be established automatically without manual measurement,and the efficiency and cost that the FGGC model takes for the fingerprint database are superior to previous methods.The proposed indoor localization approach,which estimates the position step by step from the approximate grid location to the fine-grained location,can achieve higher real-time performance and localization accuracy simultaneously.The mean error of the proposed model is 0.36 m,far lower than that of previous approaches.Thus,the proposed model is feasible to improve the efficiency and accuracy of Wi-Fi indoor localization.It also shows high-accuracy performance with a fast running speed even under a large-size grid.The results indicate that the proposed method can also be suitable for precise marketing,indoor navigation,and emergency rescue.
基金the National Key R&D Program of China(2020YFB1708300)the National Natural Science Foundation of China(52005192)the Project of Ministry of Industry and Information Technology(TC210804R-3).
文摘This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstrategy of CPU/GPU is proposed, while the hybrid parallel strategies for stiffness matrix assembly, equationsolving, sensitivity analysis, and design variable update are discussed in detail. To ensure the high efficiency ofCPU/GPU computing, a workload balancing strategy is presented for optimally distributing the workload betweenCPU and GPU. To illustrate the advantages of the proposedmethod, three benchmark examples are tested to verifythe hybrid parallel strategy in this paper. The results show that the efficiency of the hybrid method is faster thanserial CPU and parallel GPU, while the speedups can be up to two orders of magnitude.
文摘This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from graphic-centric processors to versatile computing units,it delves into the nuanced optimization of memory access,thread management,algorithmic design,and data structures.These optimizations are critical for exploiting the parallel processing capabilities of GPUs,addressingboth the theoretical frameworks and practical implementations.By integrating advanced strategies such as memory coalescing,dynamic scheduling,and parallel algorithmic transformations,this research aims to significantly elevate computational efficiency and throughput.The findings underscore the potential of optimized GPU programming to revolutionize computational tasks across various domains,highlighting a pathway towards achieving unparalleled processing power and efficiency in HPC environments.The paper not only contributes to the academic discourse on GPU optimization but also provides actionable insights for developers,fostering advancements in computational sciences and technology.
基金supported by Postdoctoral Science Foundation of China(No.2021M702441)National Natural Science Foundation of China(No.61871283)。
文摘To efficiently complete a complex computation task,the complex task should be decomposed into subcomputation tasks that run parallel in edge computing.Wireless Sensor Network(WSN)is a typical application of parallel computation.To achieve highly reliable parallel computation for wireless sensor network,the network's lifetime needs to be extended.Therefore,a proper task allocation strategy is needed to reduce the energy consumption and balance the load of the network.This paper proposes a task model and a cluster-based WSN model in edge computing.In our model,different tasks require different types of resources and different sensors provide different types of resources,so our model is heterogeneous,which makes the model more practical.Then we propose a task allocation algorithm that combines the Genetic Algorithm(GA)and the Ant Colony Optimization(ACO)algorithm.The algorithm concentrates on energy conservation and load balancing so that the lifetime of the network can be extended.The experimental result shows the algorithm's effectiveness and advantages in energy conservation and load balancing.
文摘A comparison between deep learning and standalone models in predicting the compaction parameters of soil is presented in this research.One hundred and ninety and fifty-three soil samples were randomly picked up from two hundred and forty-three soil samples to create training and validation datasets,respectively.The performance and accuracy of the models were measured by root mean square error(RMSE),coefficient of determination(R2),Pearson product-moment correlation coefficient(r),mean absolute error(MAE),variance accounted for(VAF),mean absolute percentage error(MAPE),weighted mean absolute percentage error(WMAPE),a20-index,index of scatter(IOS),and index of agreement(IOA).Comparisons between standalone models demonstrate that the model MD 29 in Gaussian process regression(GPR)and model MD 101 in support vector machine(SVM)can achieve over 96%of accuracy in predicting the optimum moisture content(OMC)and maximum dry density(MDD)of soil,and outperformed other standalone models.The comparison between deep learning models shows that the models MD 46 and MD 146 in long short-term memory(LSTM)predict OMC and MDD with higher accuracy than ANN models.However,the LSTM models outperformed the GPR models in predicting the compaction parameters.The sensitivity analysis illustrates that fine content(FC),specific gravity(SG),and liquid limit(LL)highly influence the prediction of compaction parameters.
基金the China Natural Science Fund(No.52171253)the Natural Science Foundation of Sichuan(No.2022NSFSCO949).
文摘Accurate 3-dimensional(3-D)reconstruction technology for nondestructive testing based on digital radiography(DR)is of great importance for alleviating the drawbacks of the existing computed tomography(CT)-based method.The commonly used Monte Carlo simulation method ensures well-performing imaging results for DR.However,for 3-D reconstruction,it is limited by its high time consumption.To solve this problem,this study proposes a parallel computing method to accelerate Monte Carlo simulation for projection images with a parallel interface and a specific DR application.The images are utilized for 3-D reconstruction of the test model.We verify the accuracy of parallel computing for DR and evaluate the performance of two parallel computing modes-multithreaded applications(G4-MT)and message-passing interfaces(G4-MPI)-by assessing parallel speedup and efficiency.This study explores the scalability of the hybrid G4-MPI and G4-MT modes.The results show that the two parallel computing modes can significantly reduce the Monte Carlo simulation time because the parallel speedup increment of Monte Carlo simulations can be considered linear growth,and the parallel efficiency is maintained at a high level.The hybrid mode has strong scalability,as the overall run time of the 180 simulations using 320 threads is 15.35 h with 10 billion particles emitted,and the parallel speedup can be up to 151.36.The 3-D reconstruction of the model is achieved based on the filtered back projection(FBP)algorithm using 180 projection images obtained with the hybrid G4-MPI and G4-MT.The quality of the reconstructed sliced images is satisfactory because the images can reflect the internal structure of the test model.This method is applied to a complex model,and the quality of the reconstructed images is evaluated.
基金supported by the National Basic Research Program of China(973 Program)(2010CB428801,2010CB428804)National High-tech R&D Program of China(863 Program)(2011AA050105)+1 种基金National Science Foundation of China(40972166)National Science and Technology Major Project of China(2011ZX 05060-005).
文摘The discrete fracture network model is a powerful tool for fractured rock mass fluid flow simulations and supports safety assessments of coal mine hazards such as water inrush.Intersection analysis,which identifies all pairs of intersected fractures(the basic components composing the connectivity of a network),is one of its crucial procedures.This paper attempts to improve intersection analysis through parallel computing.Considering a seamless interfacing with other procedures in modeling,two algorithms are designed and presented,of which one is a completely independent parallel procedure with some redundant computations and the other is an optimized version with reduced redundancy.A numerical study indicates that both of the algorithms are practical and can significantly improve the computational performance of intersection analysis for large-scale simulations.Moreover,the preferred application conditions for the two algorithms are also discussed.
基金financially supported by the National Natural Science Foundation of China(No.42002133,42072150)Natural Science Foundation of Beijing(8204069)+1 种基金Strategic Cooperation Project of PetroChina and CUPB(ZLZX2020-01-06-01)Science Foundation of China University of Petroleum,Beijing(No.2462021YXZZ003)
文摘Core,thin section,conventional and image logs are used to provide insights into distribution of fractures in fine grained sedimentary rocks of Permian Lucaogou Formation in Jimusar Sag.Bedding parallel fractures are common in fine grained sedimentary rocks which are characterized by layered structures.Core and thin section analysis reveal that fractures in Lucaogou Formation include tectonic inclined fracture,bedding parallel fracture,and abnormal high pressure fracture.Bedding parallel fractures are abundant,but only minor amounts of them remain open,and most of them are partly to fully sealed by carbonate minerals(calcite)and bitumen.Bedding parallel fractures result in a rapid decrease in resistivity,and they are recognized on image logs to extend along bedding planes and have discontinuous surfaces due to partly-fully filled resistive carbonate minerals as well as late stage dissolution.A comprehensive interpretation of distribution of bedding parallel fractures is performed with green line,red line,yellow line and blue line representing bedding planes,induced fractures,resistive fractures,and open(bedding and inclined)fractures,respectively.The strike of bedding parallel fractures is coinciding with bedding planes.Bedding parallel fractures are closely associated with the amounts of bedding planes,and high density of bedding planes favor the formation of bedding parallel fractures.Alternating dark and bright layers have the most abundant bedding parallel fractures on the image logs,and the bedding parallel fractures are always associated with low resistivity zones.The results above may help optimize sweet spots in fine grained sedimentary rocks,and improve future fracturing design and optimize well spacing.
基金supported by the National Natural Science Foundation of China (NSFC)Basic Science Center Program for Multiscale Problems in Nonlinear Mechanics’(Grant No. 11988102)NSFC project (Grant No. 11972038)
文摘A computational fluid dynamics(CFD)solver for a GPU/CPU heterogeneous architecture parallel computing platform is developed to simulate incompressible flows on billion-level grid points.To solve the Poisson equation,the conjugate gradient method is used as a basic solver,and a Chebyshev method in combination with a Jacobi sub-preconditioner is used as a preconditioner.The developed CFD solver shows good performance on parallel efficiency,which exceeds 90%in the weak-scalability test when the number of grid points allocated to each GPU card is greater than 2083.In the acceleration test,it is found that running a simulation with 10403 grid points on 125 GPU cards accelerates by 203.6x over the same number of CPU cores.The developed solver is then tested in the context of a two-dimensional lid-driven cavity flow and three-dimensional Taylor-Green vortex flow.The results are consistent with previous results in the literature.
基金supported by the National Key R&D Program of China(2020YFB0905900).
文摘The construction of new power systems presents higher requirements for the Power Internet of Things(PIoT)technology.The“source-grid-load-storage”architecture of a new power system requires PIoT to have a stronger multi-source heterogeneous data fusion ability.Native graph databases have great advantages in dealing with multi-source heterogeneous data,which make them suitable for an increasing number of analytical computing tasks.However,only few existing graph database products have native support for matrix operation-related interfaces or functions,resulting in low efficiency when handling matrix calculations that are commonly encountered in power grids.In this paper,the matrix computation process is expressed by a strategy called graph description,which relies on the natural connection between the matrix and structure of the graph.Based on that,we implement matrix operations on graph database,including matrix multiplication,matrix decomposition,etc.Specifically,only the nodes relevant to the computation and their neighbors are concerned in the process,which prunes the influence of zero elements in the matrix and avoids useless iterations compared to the conventional matrix computation.Based on the graph description,a series of power grid computations can be implemented on graph database,which reduces redundant data import and export operations while leveraging the parallel computing capability of graph database.It promotes the efficiency of PIoT when handling multi-source heterogeneous data.An comprehensive experimental study over two different scale power system datasets compares the proposed method with Python and MATLAB baselines.The results reveal the superior performance of our proposed method in both power flow and N-1 contingency computations.
基金the Deanship of Scientific Research at Najran University for funding this work under the Research Groups Funding Program Grant Code(NU/RG/SERC/12/3).
文摘Due to the inherent insecure nature of the Internet,it is crucial to ensure the secure transmission of image data over this network.Additionally,given the limitations of computers,it becomes evenmore important to employ efficient and fast image encryption techniques.While 1D chaotic maps offer a practical approach to real-time image encryption,their limited flexibility and increased vulnerability restrict their practical application.In this research,we have utilized a 3DHindmarsh-Rosemodel to construct a secure cryptosystem.The randomness of the chaotic map is assessed through standard analysis.The proposed system enhances security by incorporating an increased number of system parameters and a wide range of chaotic parameters,as well as ensuring a uniformdistribution of chaotic signals across the entire value space.Additionally,a fast image encryption technique utilizing the new chaotic system is proposed.The novelty of the approach is confirmed through time complexity analysis.To further strengthen the resistance against cryptanalysis attacks and differential attacks,the SHA-256 algorithm is employed for secure key generation.Experimental results through a number of parameters demonstrate the strong cryptographic performance of the proposed image encryption approach,highlighting its exceptional suitability for secure communication.Moreover,the security of the proposed scheme has been compared with stateof-the-art image encryption schemes,and all comparison metrics indicate the superior performance of the proposed scheme.
基金Supported by the National Natural Science Foundation of China (No.60172029) and the Natural Science Foun-dation of Shaanxi Province (No.2004F04).
文摘Derived from a proposed universal mathematical expression, this paper investigates a novel algo-rithm for parallel Cyclic Redundancy Check (CRC) computation, which is an iterative algorithm to update the check-bit sequence step by step and suits to various argument selections of CRC computation. The algorithm proposed is quite suitable for hardware implementation. The simulation implementation and performance analysis suggest that it could efficiently speed up the computation compared with the conventional ones. The algorithm is implemented in hardware at as high as 21Gbps, and its usefulness in high-speed CRC computa-tions is implied, such as Asynchronous Transfer Mode (ATM) networks and 10G Ethernet.
基金Project supported by the National Defense Laboratory Foundation (Grant No.51444020103QT0601)the Shanghai Leading Academic Discipline Project (Grant No.T0102)
文摘In this work, we treat scattering objects, water, surface and bottom in a truly unified manner in a parallel finitedifference time-domain (FDTD) scheme, which is suitable for distributed parallel computing in a message passing interface (MPI) programming environment. The algorithm is implemented on a cluster-based high performance computer system. Parallel computation is performed with different division methods in 2D and 3D situations. Based on analysis of main factors affecting the speedup rate and parallel efficiency, data communication is reduced by selecting a suitable scheme of task division. A desirable scheme is recommended, giving a higher speedup rate and better efficiency. The results indicate that the unified parallel FDTD algorithm provides a solution to the numerical computation of acoustic scattering.
文摘Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limitation is particularly attractive, but is currently limited by the huge amount of calculation. In this paper, we propose a globally optimal FWI framework based on GPU parallel computing, which greatly improves the efficiency, and is expected to make globally optimal FWI more widely used. In this framework, we simplify and recombine the model parameters, and optimize the model iteratively. Each iteration contains hundreds of individuals, each individual is independent of the other, and each individual contains forward modeling and cost function calculation. The framework is suitable for a variety of globally optimal algorithms, and we test the framework with particle swarm optimization algorithm for example. Both the synthetic and field examples achieve good results, indicating the effectiveness of the framework. .
基金the Deanship of Scientific Research at King Abdulaziz University,Jeddah,Saudi Arabia under the Grant No.RG-12-611-43.
文摘The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memorysystems.However, MPI implementations can contain defects that impact the reliability and performance of parallelapplications. Detecting and correcting these defects is crucial, yet there is a lack of published models specificallydesigned for correctingMPI defects. To address this, we propose a model for detecting and correcting MPI defects(DC_MPI), which aims to detect and correct defects in various types of MPI communication, including blockingpoint-to-point (BPTP), nonblocking point-to-point (NBPTP), and collective communication (CC). The defectsaddressed by the DC_MPI model include illegal MPI calls, deadlocks (DL), race conditions (RC), and messagemismatches (MM). To assess the effectiveness of the DC_MPI model, we performed experiments on a datasetconsisting of 40 MPI codes. The results indicate that the model achieved a detection rate of 37 out of 40 codes,resulting in an overall detection accuracy of 92.5%. Additionally, the execution duration of the DC_MPI modelranged from 0.81 to 1.36 s. These findings show that the DC_MPI model is useful in detecting and correctingdefects in MPI implementations, thereby enhancing the reliability and performance of parallel applications. TheDC_MPImodel fills an important research gap and provides a valuable tool for improving the quality ofMPI-basedparallel computing systems.
基金supported by the fund from ShenyangMint Company Limited(No.20220056)Senior Talent Foundation of Jiangsu University(No.19JDG022)Taizhou City Double Innovation and Entrepreneurship Talent Program(No.Taizhou Human Resources Office[2022]No.22).
文摘In this research,we present the pure open multi-processing(OpenMP),pure message passing interface(MPI),and hybrid MPI/OpenMP parallel solvers within the dynamic explicit central difference algorithm for the coining process to address the challenge of capturing fine relief features of approximately 50 microns.Achieving such precision demands the utilization of at least 7 million tetrahedron elements,surpassing the capabilities of traditional serial programs previously developed.To mitigate data races when calculating internal forces,intermediate arrays are introduced within the OpenMP directive.This helps ensure proper synchronization and avoid conflicts during parallel execution.Additionally,in the MPI implementation,the coins are partitioned into the desired number of regions.This division allows for efficient distribution of computational tasks across multiple processes.Numerical simulation examples are conducted to compare the three solvers with serial programs,evaluating correctness,acceleration ratio,and parallel efficiency.The results reveal a relative error of approximately 0.3%in forming force among the parallel and serial solvers,while the predicted insufficient material zones align with experimental observations.Additionally,speedup ratio and parallel efficiency are assessed for the coining process simulation.The pureMPI parallel solver achieves a maximum acceleration of 9.5 on a single computer(utilizing 12 cores)and the hybrid solver exhibits a speedup ratio of 136 in a cluster(using 6 compute nodes and 12 cores per compute node),showing the strong scalability of the hybrid MPI/OpenMP programming model.This approach effectively meets the simulation requirements for commemorative coins with intricate relief patterns.
文摘A method of the parallel computation of the linear quadratic non cooperative dynamic games problem is proposed. The Lyapunov function is introduced, through which the form adapted to parallel computation of the open loop Nash equilibrium strategies is gi
基金national natural science foundation natural science foundation of Gansu province.
文摘In this paper, based on the implicit Runge-Kutta(IRK) methods, we derive a class of parallel scheme that can be implemented on the parallel computers with Ns(N is a positive even number) processors efficiently, and discuss the iteratively B-convergence of the Newton iterative process for solving the algebraic equations of the scheme, secondly we present a strategy providing initial values parallelly for the iterative process. Finally, some numerical results show that our parallel scheme is higher efficient as N is not so large.