This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from g...This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from graphic-centric processors to versatile computing units,it delves into the nuanced optimization of memory access,thread management,algorithmic design,and data structures.These optimizations are critical for exploiting the parallel processing capabilities of GPUs,addressingboth the theoretical frameworks and practical implementations.By integrating advanced strategies such as memory coalescing,dynamic scheduling,and parallel algorithmic transformations,this research aims to significantly elevate computational efficiency and throughput.The findings underscore the potential of optimized GPU programming to revolutionize computational tasks across various domains,highlighting a pathway towards achieving unparalleled processing power and efficiency in HPC environments.The paper not only contributes to the academic discourse on GPU optimization but also provides actionable insights for developers,fostering advancements in computational sciences and technology.展开更多
Factors that have effect on concrete creep include mixture composition,curing conditions,ambient exposure conditions,and element geometry.Considering concrete mixtures influence and in order to improve the prediction ...Factors that have effect on concrete creep include mixture composition,curing conditions,ambient exposure conditions,and element geometry.Considering concrete mixtures influence and in order to improve the prediction of prestress loss in important structures,an experimental test under laboratory conditions was carried out to investigate compression creep of two high performance concrete mixtures used for prestressed members in one bridge.Based on the experimental results,a power exponent function of creep degree for structural numerical analysis was used to model the creep degree of two HPCs,and two series of parameters of this function for two HPCs were calculated with evolution program optimum method.The experimental data was compared with CEB-FIP 90 and ACI 209(92) models,and the two code models both overestimated creep degrees of the two HPCs.So it is recommended that the power exponent function should be used in this bridge structure analysis.展开更多
In order to investigate the compression creep of two kinds of high-performance concrete mixtures used for prestressed members in a bridge,an experimental test under laboratory conditions was carried out.Based on the e...In order to investigate the compression creep of two kinds of high-performance concrete mixtures used for prestressed members in a bridge,an experimental test under laboratory conditions was carried out.Based on the experimental results,a power exponent function was used to model the creep degree of these high-performance concretes(HPCs) for structural numerical analysis,and two series parameters of this function for the HPCs were given with the optimum method of evolution program.The experimental data were compared with CEB-FIP 90 and ACI 92 models.Results show that the two code models both overestimate the creep degree of two HPCs,so it is recommended that the power exponent function should be used for the creep analysis of bridge structure.展开更多
Due to current technology enhancement,molecular databases have exponentially grown requesting faster efficient methods that can handle these amounts of huge data.There-fore,Multi-processing CPUs technology can be used...Due to current technology enhancement,molecular databases have exponentially grown requesting faster efficient methods that can handle these amounts of huge data.There-fore,Multi-processing CPUs technology can be used including physical and logical processors(Hyper Threading)to significantly increase the performance of computations.Accordingly,sequence comparison and pairwise alignment were both found contributing significantly in calculating the resemblance between sequences for constructing optimal alignments.This research used the Hash Table-NGram-Hirschberg(HT-NGH)algo-rithm to represent this pairwise alignment utilizing hashing capabilities.The authors propose using parallel shared memory architecture via Hyper Threading to improve the performance of molecular dataset protein pairwise alignment.The proposed parallel hyper threading method targeted the transformation of the HT-NGH on the datasets decomposition for sequence level efficient utilization within the processing units,that is,reducing idle processing unit situations.The authors combined hyper threading within the multicore architecture processing on shared memory utilization remarking perfor-mance of 24.8%average speed up to 34.4%as the highest boosting rate.The benefit of this work improvement is shown preserving acceptable accuracy,that is,reaching 2.08,2.88,and 3.87 boost-up as well as the efficiency of 1.04,0.96,and 0.97,using 2,3,and 4 cores,respectively,as attractive remarkable results.展开更多
As the Internet of Things(IoT)and mobile devices have rapidly proliferated,their computationally intensive applications have developed into complex,concurrent IoT-based workflows involving multiple interdependent task...As the Internet of Things(IoT)and mobile devices have rapidly proliferated,their computationally intensive applications have developed into complex,concurrent IoT-based workflows involving multiple interdependent tasks.By exploiting its low latency and high bandwidth,mobile edge computing(MEC)has emerged to achieve the high-performance computation offloading of these applications to satisfy the quality-of-service requirements of workflows and devices.In this study,we propose an offloading strategy for IoT-based workflows in a high-performance MEC environment.The proposed task-based offloading strategy consists of an optimization problem that includes task dependency,communication costs,workflow constraints,device energy consumption,and the heterogeneous characteristics of the edge environment.In addition,the optimal placement of workflow tasks is optimized using a discrete teaching learning-based optimization(DTLBO)metaheuristic.Extensive experimental evaluations demonstrate that the proposed offloading strategy is effective at minimizing the energy consumption of mobile devices and reducing the execution times of workflows compared to offloading strategies using different metaheuristics,including particle swarm optimization and ant colony optimization.展开更多
Cloud computing is expanding widely in the world of IT infrastructure. This is due partly to the cost-saving effect of economies of scale. Fair market conditions can in theory provide a healthy environment to reflect ...Cloud computing is expanding widely in the world of IT infrastructure. This is due partly to the cost-saving effect of economies of scale. Fair market conditions can in theory provide a healthy environment to reflect the most reasonable costs of computations. While fixed cloud pricing provides an attractive low entry barrier for compute-intensive applications, both the consumer and supplier of computing resources can see high efficiency for their investments by participating in auction-based exchanges. There are huge incentives for the cloud provider to offer auctioned resources. However, from the consumer perspective, using these resources is a sparsely discussed challenge. This paper reports a methodology and framework designed to address the challenges of using HPC (High Performance Computing) applications on auction-based cloud clusters. The authors focus on HPC applications and describe a method for determining bid-aware checkpointing intervals. They extend a theoretical model for determining checkpoint intervals using statistical analysis of pricing histories. Also the latest developments in the SpotHPC framework are introduced which aim at facilitating the managed execution of real MPI applications on auction-based cloud environments. The authors use their model to simulate a set of algorithms with different computing and communication densities. The results show the complex interactions between optimal bidding strategies and parallel applications performance.展开更多
We present multi-threading and SIMD optimizations on short-range potential calculation kernel in Molecular Dynamics.For the multi-threading optimization,we design a partition-and-two-steps(PTS)method to avoid write co...We present multi-threading and SIMD optimizations on short-range potential calculation kernel in Molecular Dynamics.For the multi-threading optimization,we design a partition-and-two-steps(PTS)method to avoid write conflicts caused by using Newton’s third law.Our method eliminates serialization bottle-neck without extra memory.We implement our PTS method using OpenMP.Afterwards,we discuss the influence of the cutoff if statement on the performance of vectorization in MD simulations.We propose a pre-searching neighbors method,which makes about 70%atoms meet the cutoff check,reducing a large amount of redundant calculation.The experiment results prove our PTS method is scalable and efficient.In double precision,our 256-bit SIMD implementation is about 3×faster than the scalar version.展开更多
The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memorysystems.However, MPI implementations can contain defects that impact the reliability and performance of par...The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memorysystems.However, MPI implementations can contain defects that impact the reliability and performance of parallelapplications. Detecting and correcting these defects is crucial, yet there is a lack of published models specificallydesigned for correctingMPI defects. To address this, we propose a model for detecting and correcting MPI defects(DC_MPI), which aims to detect and correct defects in various types of MPI communication, including blockingpoint-to-point (BPTP), nonblocking point-to-point (NBPTP), and collective communication (CC). The defectsaddressed by the DC_MPI model include illegal MPI calls, deadlocks (DL), race conditions (RC), and messagemismatches (MM). To assess the effectiveness of the DC_MPI model, we performed experiments on a datasetconsisting of 40 MPI codes. The results indicate that the model achieved a detection rate of 37 out of 40 codes,resulting in an overall detection accuracy of 92.5%. Additionally, the execution duration of the DC_MPI modelranged from 0.81 to 1.36 s. These findings show that the DC_MPI model is useful in detecting and correctingdefects in MPI implementations, thereby enhancing the reliability and performance of parallel applications. TheDC_MPImodel fills an important research gap and provides a valuable tool for improving the quality ofMPI-basedparallel computing systems.展开更多
Hyperparameter tuning is a key step in developing high-performing machine learning models, but searching large hyperparameter spaces requires extensive computation using standard sequential methods. This work analyzes...Hyperparameter tuning is a key step in developing high-performing machine learning models, but searching large hyperparameter spaces requires extensive computation using standard sequential methods. This work analyzes the performance gains from parallel versus sequential hyperparameter optimization. Using scikit-learn’s Randomized SearchCV, this project tuned a Random Forest classifier for fake news detection via randomized grid search. Setting n_jobs to -1 enabled full parallelization across CPU cores. Results show the parallel implementation achieved over 5× faster CPU times and 3× faster total run times compared to sequential tuning. However, test accuracy slightly dropped from 99.26% sequentially to 99.15% with parallelism, indicating a trade-off between evaluation efficiency and model performance. Still, the significant computational gains allow more extensive hyperparameter exploration within reasonable timeframes, outweighing the small accuracy decrease. Further analysis could better quantify this trade-off across different models, tuning techniques, tasks, and hardware.展开更多
In the last two decades, computational hydraulics has undergone a rapid development following the advancement of data acquisition and computing technologies. Using a finite-volume Godunov-type hydrodynamic model, this...In the last two decades, computational hydraulics has undergone a rapid development following the advancement of data acquisition and computing technologies. Using a finite-volume Godunov-type hydrodynamic model, this work demonstrates the promise of modern high-performance computing technology to achieve real-time flood modeling at a regional scale. The software is implemented for high-performance heterogeneous computing using the OpenCL programming framework, and developed to support simulations across multiple GPUs using a domain decomposition technique and across multiple systems through an efficient implementation of the Message Passing Interface (MPI) standard. The software is applied for a convective storm induced flood event in Newcastle upon Tyne, demonstrating high computational performance across a GPU cluster, and good agreement against crowd- sourced observations. Issues relating to data availability, complex urban topography and differences in drainage capacity affect results for a small number of areas.展开更多
Discrete element method can effectively simulate the discontinuity,inhomogeneity and large deformation and failure of rock and soil.Based on the innovative matrix computing of the discrete element method,the highperfo...Discrete element method can effectively simulate the discontinuity,inhomogeneity and large deformation and failure of rock and soil.Based on the innovative matrix computing of the discrete element method,the highperformance discrete element software MatDEM may handle millions of elements in one computer,and enables the discrete element simulation at the engineering scale.It supports heat calculation,multi-field and fluidsolid coupling numerical simulations.Furthermore,the software integrates pre-processing,solver,postprocessing,and powerful secondary development,allowing recompiling new discrete element software.The basic principles of the DEM,the implement and development of the MatDEM software,and its applications are introduced in this paper.The software and sample source code are available online(http://matdem.com).展开更多
文摘This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from graphic-centric processors to versatile computing units,it delves into the nuanced optimization of memory access,thread management,algorithmic design,and data structures.These optimizations are critical for exploiting the parallel processing capabilities of GPUs,addressingboth the theoretical frameworks and practical implementations.By integrating advanced strategies such as memory coalescing,dynamic scheduling,and parallel algorithmic transformations,this research aims to significantly elevate computational efficiency and throughput.The findings underscore the potential of optimized GPU programming to revolutionize computational tasks across various domains,highlighting a pathway towards achieving unparalleled processing power and efficiency in HPC environments.The paper not only contributes to the academic discourse on GPU optimization but also provides actionable insights for developers,fostering advancements in computational sciences and technology.
文摘Factors that have effect on concrete creep include mixture composition,curing conditions,ambient exposure conditions,and element geometry.Considering concrete mixtures influence and in order to improve the prediction of prestress loss in important structures,an experimental test under laboratory conditions was carried out to investigate compression creep of two high performance concrete mixtures used for prestressed members in one bridge.Based on the experimental results,a power exponent function of creep degree for structural numerical analysis was used to model the creep degree of two HPCs,and two series of parameters of this function for two HPCs were calculated with evolution program optimum method.The experimental data was compared with CEB-FIP 90 and ACI 209(92) models,and the two code models both overestimated creep degrees of the two HPCs.So it is recommended that the power exponent function should be used in this bridge structure analysis.
文摘In order to investigate the compression creep of two kinds of high-performance concrete mixtures used for prestressed members in a bridge,an experimental test under laboratory conditions was carried out.Based on the experimental results,a power exponent function was used to model the creep degree of these high-performance concretes(HPCs) for structural numerical analysis,and two series parameters of this function for the HPCs were given with the optimum method of evolution program.The experimental data were compared with CEB-FIP 90 and ACI 92 models.Results show that the two code models both overestimate the creep degree of two HPCs,so it is recommended that the power exponent function should be used for the creep analysis of bridge structure.
基金Deanship of Scientific Research(DSR),King Abdulaziz University,Grant/Award Number:D-139-137-1441。
文摘Due to current technology enhancement,molecular databases have exponentially grown requesting faster efficient methods that can handle these amounts of huge data.There-fore,Multi-processing CPUs technology can be used including physical and logical processors(Hyper Threading)to significantly increase the performance of computations.Accordingly,sequence comparison and pairwise alignment were both found contributing significantly in calculating the resemblance between sequences for constructing optimal alignments.This research used the Hash Table-NGram-Hirschberg(HT-NGH)algo-rithm to represent this pairwise alignment utilizing hashing capabilities.The authors propose using parallel shared memory architecture via Hyper Threading to improve the performance of molecular dataset protein pairwise alignment.The proposed parallel hyper threading method targeted the transformation of the HT-NGH on the datasets decomposition for sequence level efficient utilization within the processing units,that is,reducing idle processing unit situations.The authors combined hyper threading within the multicore architecture processing on shared memory utilization remarking perfor-mance of 24.8%average speed up to 34.4%as the highest boosting rate.The benefit of this work improvement is shown preserving acceptable accuracy,that is,reaching 2.08,2.88,and 3.87 boost-up as well as the efficiency of 1.04,0.96,and 0.97,using 2,3,and 4 cores,respectively,as attractive remarkable results.
文摘As the Internet of Things(IoT)and mobile devices have rapidly proliferated,their computationally intensive applications have developed into complex,concurrent IoT-based workflows involving multiple interdependent tasks.By exploiting its low latency and high bandwidth,mobile edge computing(MEC)has emerged to achieve the high-performance computation offloading of these applications to satisfy the quality-of-service requirements of workflows and devices.In this study,we propose an offloading strategy for IoT-based workflows in a high-performance MEC environment.The proposed task-based offloading strategy consists of an optimization problem that includes task dependency,communication costs,workflow constraints,device energy consumption,and the heterogeneous characteristics of the edge environment.In addition,the optimal placement of workflow tasks is optimized using a discrete teaching learning-based optimization(DTLBO)metaheuristic.Extensive experimental evaluations demonstrate that the proposed offloading strategy is effective at minimizing the energy consumption of mobile devices and reducing the execution times of workflows compared to offloading strategies using different metaheuristics,including particle swarm optimization and ant colony optimization.
基金"This paper is an extended version of "SpotMPl: a framework for auction-based HPC computing using amazon spot instances" published in the International Symposium on Advances of Distributed Computing and Networking (ADCN 2011).Acknowledgment This research is supported in part by the National Science Foundation grant CNS 0958854 and educational resource grants from Amazon.com.
文摘Cloud computing is expanding widely in the world of IT infrastructure. This is due partly to the cost-saving effect of economies of scale. Fair market conditions can in theory provide a healthy environment to reflect the most reasonable costs of computations. While fixed cloud pricing provides an attractive low entry barrier for compute-intensive applications, both the consumer and supplier of computing resources can see high efficiency for their investments by participating in auction-based exchanges. There are huge incentives for the cloud provider to offer auctioned resources. However, from the consumer perspective, using these resources is a sparsely discussed challenge. This paper reports a methodology and framework designed to address the challenges of using HPC (High Performance Computing) applications on auction-based cloud clusters. The authors focus on HPC applications and describe a method for determining bid-aware checkpointing intervals. They extend a theoretical model for determining checkpoint intervals using statistical analysis of pricing histories. Also the latest developments in the SpotHPC framework are introduced which aim at facilitating the managed execution of real MPI applications on auction-based cloud environments. The authors use their model to simulate a set of algorithms with different computing and communication densities. The results show the complex interactions between optimal bidding strategies and parallel applications performance.
文摘We present multi-threading and SIMD optimizations on short-range potential calculation kernel in Molecular Dynamics.For the multi-threading optimization,we design a partition-and-two-steps(PTS)method to avoid write conflicts caused by using Newton’s third law.Our method eliminates serialization bottle-neck without extra memory.We implement our PTS method using OpenMP.Afterwards,we discuss the influence of the cutoff if statement on the performance of vectorization in MD simulations.We propose a pre-searching neighbors method,which makes about 70%atoms meet the cutoff check,reducing a large amount of redundant calculation.The experiment results prove our PTS method is scalable and efficient.In double precision,our 256-bit SIMD implementation is about 3×faster than the scalar version.
基金the Deanship of Scientific Research at King Abdulaziz University,Jeddah,Saudi Arabia under the Grant No.RG-12-611-43.
文摘The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memorysystems.However, MPI implementations can contain defects that impact the reliability and performance of parallelapplications. Detecting and correcting these defects is crucial, yet there is a lack of published models specificallydesigned for correctingMPI defects. To address this, we propose a model for detecting and correcting MPI defects(DC_MPI), which aims to detect and correct defects in various types of MPI communication, including blockingpoint-to-point (BPTP), nonblocking point-to-point (NBPTP), and collective communication (CC). The defectsaddressed by the DC_MPI model include illegal MPI calls, deadlocks (DL), race conditions (RC), and messagemismatches (MM). To assess the effectiveness of the DC_MPI model, we performed experiments on a datasetconsisting of 40 MPI codes. The results indicate that the model achieved a detection rate of 37 out of 40 codes,resulting in an overall detection accuracy of 92.5%. Additionally, the execution duration of the DC_MPI modelranged from 0.81 to 1.36 s. These findings show that the DC_MPI model is useful in detecting and correctingdefects in MPI implementations, thereby enhancing the reliability and performance of parallel applications. TheDC_MPImodel fills an important research gap and provides a valuable tool for improving the quality ofMPI-basedparallel computing systems.
文摘Hyperparameter tuning is a key step in developing high-performing machine learning models, but searching large hyperparameter spaces requires extensive computation using standard sequential methods. This work analyzes the performance gains from parallel versus sequential hyperparameter optimization. Using scikit-learn’s Randomized SearchCV, this project tuned a Random Forest classifier for fake news detection via randomized grid search. Setting n_jobs to -1 enabled full parallelization across CPU cores. Results show the parallel implementation achieved over 5× faster CPU times and 3× faster total run times compared to sequential tuning. However, test accuracy slightly dropped from 99.26% sequentially to 99.15% with parallelism, indicating a trade-off between evaluation efficiency and model performance. Still, the significant computational gains allow more extensive hyperparameter exploration within reasonable timeframes, outweighing the small accuracy decrease. Further analysis could better quantify this trade-off across different models, tuning techniques, tasks, and hardware.
基金Project supported by the UK NERC SINATRA Project(Grant No.NE/K008781/1)
文摘In the last two decades, computational hydraulics has undergone a rapid development following the advancement of data acquisition and computing technologies. Using a finite-volume Godunov-type hydrodynamic model, this work demonstrates the promise of modern high-performance computing technology to achieve real-time flood modeling at a regional scale. The software is implemented for high-performance heterogeneous computing using the OpenCL programming framework, and developed to support simulations across multiple GPUs using a domain decomposition technique and across multiple systems through an efficient implementation of the Message Passing Interface (MPI) standard. The software is applied for a convective storm induced flood event in Newcastle upon Tyne, demonstrating high computational performance across a GPU cluster, and good agreement against crowd- sourced observations. Issues relating to data availability, complex urban topography and differences in drainage capacity affect results for a small number of areas.
基金Financial supports from the Natural Science Foundation of China(41761134089,41977218)Six Talent Peaks Project of Jiangsu Province(RJFW-003)the Fundamental Research Funds for the Central Universities(14380103)are gratefully acknowledged.
文摘Discrete element method can effectively simulate the discontinuity,inhomogeneity and large deformation and failure of rock and soil.Based on the innovative matrix computing of the discrete element method,the highperformance discrete element software MatDEM may handle millions of elements in one computer,and enables the discrete element simulation at the engineering scale.It supports heat calculation,multi-field and fluidsolid coupling numerical simulations.Furthermore,the software integrates pre-processing,solver,postprocessing,and powerful secondary development,allowing recompiling new discrete element software.The basic principles of the DEM,the implement and development of the MatDEM software,and its applications are introduced in this paper.The software and sample source code are available online(http://matdem.com).