This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from g...This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from graphic-centric processors to versatile computing units,it delves into the nuanced optimization of memory access,thread management,algorithmic design,and data structures.These optimizations are critical for exploiting the parallel processing capabilities of GPUs,addressingboth the theoretical frameworks and practical implementations.By integrating advanced strategies such as memory coalescing,dynamic scheduling,and parallel algorithmic transformations,this research aims to significantly elevate computational efficiency and throughput.The findings underscore the potential of optimized GPU programming to revolutionize computational tasks across various domains,highlighting a pathway towards achieving unparalleled processing power and efficiency in HPC environments.The paper not only contributes to the academic discourse on GPU optimization but also provides actionable insights for developers,fostering advancements in computational sciences and technology.展开更多
Production scheduling has a major impact on the productivity of the manufacturing process. Recently, scheduling problems with deteriorating jobs have attracted increasing attentions from researchers. In many practical...Production scheduling has a major impact on the productivity of the manufacturing process. Recently, scheduling problems with deteriorating jobs have attracted increasing attentions from researchers. In many practical situations,it is found that some jobs fail to be processed prior to the pre-specified thresholds,and they often consume extra deteriorating time for successful accomplishment. Their processing times can be characterized by a step-wise function. Such kinds of jobs are called step-deteriorating jobs. In this paper,parallel machine scheduling problem with stepdeteriorating jobs( PMSD) is considered. Due to its intractability,four different mixed integer programming( MIP) models are formulated for solving the problem under consideration. The study aims to investigate the performance of these models and find promising optimization formulation to solve the largest possible problem instances. The proposed four models are solved by commercial software CPLEX. Moreover,the near-optimal solutions can be obtained by black-box local-search solver LocalS olver with the fourth one. The computational results show that the efficiencies of different MIP models depend on the distribution intervals of deteriorating thresholds, and the performance of LocalS olver is clearly better than that of CPLEX in terms of the quality of the solutions and the computational time.展开更多
The workload of the 3D magnetotelluric forward modeling algorithm is so large that the traditional serial algorithm costs an extremely large compute time. However, the 3D forward modeling algorithm can process the dat...The workload of the 3D magnetotelluric forward modeling algorithm is so large that the traditional serial algorithm costs an extremely large compute time. However, the 3D forward modeling algorithm can process the data in the frequency domain, which is very suitable for parallel computation. With the advantage of MPI and based on an analysis of the flow of the 3D magnetotelluric serial forward algorithm, we suggest the idea of parallel computation and apply it. Three theoretical models are tested and the execution efficiency is compared in different situations. The results indicate that the parallel 3D forward modeling computation is correct and the efficiency is greatly improved. This method is suitable for large size geophysical computations.展开更多
A method for modeling the parallel machine scheduling problems with fuzzy parameters and precedence constraints based on credibility measure is provided. For the given n jobs to be processed on m machines, it is assum...A method for modeling the parallel machine scheduling problems with fuzzy parameters and precedence constraints based on credibility measure is provided. For the given n jobs to be processed on m machines, it is assumed that the processing times and the due dates are nonnegative fuzzy numbers and all the weights are positive, crisp numbers. Based on credibility measure, three parallel machine scheduling problems and a goal-programming model are formulated. Feasible schedules are evaluated not only by their objective values but also by the credibility degree of satisfaction with their precedence constraints. The genetic algorithm is utilized to find the best solutions in a short period of time. An illustrative numerical example is also given. Simulation results show that the proposed models are effective, which can deal with the parallel machine scheduling problems with fuzzy parameters and precedence constraints based on credibility measure.展开更多
First, an asynchronous distributed parallel evolutionary modeling algorithm (PEMA) for building the model of system of ordinary differential equations for dynamical systems is proposed in this paper. Then a series of ...First, an asynchronous distributed parallel evolutionary modeling algorithm (PEMA) for building the model of system of ordinary differential equations for dynamical systems is proposed in this paper. Then a series of parallel experiments have been conducted to systematically test the influence of some important parallel control parameters on the performance of the algorithm. A lot of experimental results are obtained and we make some analysis and explanations to them.展开更多
As the hardware industry moves toward using specialized heterogeneous many-core processors to avoid the effects of the power wall,software developers are finding it hard to deal with the complexity of these systems.In...As the hardware industry moves toward using specialized heterogeneous many-core processors to avoid the effects of the power wall,software developers are finding it hard to deal with the complexity of these systems.In this paper,we share our experience of developing a programming model and its supporting compiler and libraries for Matrix-3000,which is designed for next-generation exascale supercomputers but has a complex memory hierarchy and processor organization.To assist its software development,we have developed a software stack from scratch that includes a low-level programming interface and a high-level OpenCL compiler.Our low-level programming model offers native programming support for using the bare-metal accelerators of Matrix-3000,while the high-level model allows programmers to use the OpenCL programming standard.We detail our design choices and highlight the lessons learned from developing system software to enable the programming of bare-metal accelerators.Our programming models have been deployed in the production environment of an exascale prototype system.展开更多
In this paper, we propose a multi-criteria machine-schedules decision making method that can be applied to a produc-tion environment involving several unrelated parallel machines and we will focus on three objectives:...In this paper, we propose a multi-criteria machine-schedules decision making method that can be applied to a produc-tion environment involving several unrelated parallel machines and we will focus on three objectives: minimizing makespan, total flow time, and total number of tardy jobs. The decision making method consists of three phases. In the first phase, a mathematical model of a single machine scheduling problem, of which the objective is a weighted sum of the three objectives, is constructed. Such a model will be repeatedly solved by the CPLEX in the proposed Multi-Objective Simulated Annealing (MOSA) algorithm. In the second phase, the MOSA that integrates job clustering method, job group scheduling method, and job group – machine assignment method, is employed to obtain a set of non-dominated group schedules. During this phase, CPLEX software and the bipartite weighted matching algorithm are used repeatedly as parts of the MOSA algorithm. In the last phase, the technique of data envelopment analysis is applied to determine the most preferable schedule. A practical example is then presented in order to demonstrate the applicability of the proposed decision making method.展开更多
文摘This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from graphic-centric processors to versatile computing units,it delves into the nuanced optimization of memory access,thread management,algorithmic design,and data structures.These optimizations are critical for exploiting the parallel processing capabilities of GPUs,addressingboth the theoretical frameworks and practical implementations.By integrating advanced strategies such as memory coalescing,dynamic scheduling,and parallel algorithmic transformations,this research aims to significantly elevate computational efficiency and throughput.The findings underscore the potential of optimized GPU programming to revolutionize computational tasks across various domains,highlighting a pathway towards achieving unparalleled processing power and efficiency in HPC environments.The paper not only contributes to the academic discourse on GPU optimization but also provides actionable insights for developers,fostering advancements in computational sciences and technology.
基金National Natural Science Foundation of China(No.51405403)the Fundamental Research Funds for the Central Universities,China(No.2682014BR019)the Scientific Research Program of Education Bureau of Sichuan Province,China(No.12ZB322)
文摘Production scheduling has a major impact on the productivity of the manufacturing process. Recently, scheduling problems with deteriorating jobs have attracted increasing attentions from researchers. In many practical situations,it is found that some jobs fail to be processed prior to the pre-specified thresholds,and they often consume extra deteriorating time for successful accomplishment. Their processing times can be characterized by a step-wise function. Such kinds of jobs are called step-deteriorating jobs. In this paper,parallel machine scheduling problem with stepdeteriorating jobs( PMSD) is considered. Due to its intractability,four different mixed integer programming( MIP) models are formulated for solving the problem under consideration. The study aims to investigate the performance of these models and find promising optimization formulation to solve the largest possible problem instances. The proposed four models are solved by commercial software CPLEX. Moreover,the near-optimal solutions can be obtained by black-box local-search solver LocalS olver with the fourth one. The computational results show that the efficiencies of different MIP models depend on the distribution intervals of deteriorating thresholds, and the performance of LocalS olver is clearly better than that of CPLEX in terms of the quality of the solutions and the computational time.
基金This research is sponsored by the National Natural Science Foundation of China (No. 40374024).
文摘The workload of the 3D magnetotelluric forward modeling algorithm is so large that the traditional serial algorithm costs an extremely large compute time. However, the 3D forward modeling algorithm can process the data in the frequency domain, which is very suitable for parallel computation. With the advantage of MPI and based on an analysis of the flow of the 3D magnetotelluric serial forward algorithm, we suggest the idea of parallel computation and apply it. Three theoretical models are tested and the execution efficiency is compared in different situations. The results indicate that the parallel 3D forward modeling computation is correct and the efficiency is greatly improved. This method is suitable for large size geophysical computations.
基金Sponsored by the Basic Research Foundation of Beijing Institute of Technology (BIT-UBF-200508G4212)
文摘A method for modeling the parallel machine scheduling problems with fuzzy parameters and precedence constraints based on credibility measure is provided. For the given n jobs to be processed on m machines, it is assumed that the processing times and the due dates are nonnegative fuzzy numbers and all the weights are positive, crisp numbers. Based on credibility measure, three parallel machine scheduling problems and a goal-programming model are formulated. Feasible schedules are evaluated not only by their objective values but also by the credibility degree of satisfaction with their precedence constraints. The genetic algorithm is utilized to find the best solutions in a short period of time. An illustrative numerical example is also given. Simulation results show that the proposed models are effective, which can deal with the parallel machine scheduling problems with fuzzy parameters and precedence constraints based on credibility measure.
基金Supported by the National Natural Science Foundation of China(60133010,70071042,60073043)
文摘First, an asynchronous distributed parallel evolutionary modeling algorithm (PEMA) for building the model of system of ordinary differential equations for dynamical systems is proposed in this paper. Then a series of parallel experiments have been conducted to systematically test the influence of some important parallel control parameters on the performance of the algorithm. A lot of experimental results are obtained and we make some analysis and explanations to them.
基金Project supported by the National Key Research and Development Program of China(No.2021YFB0300101)the National Natural Science Foundation of China(No.61972408)the UK Royal Society International Collaboration Grant。
文摘As the hardware industry moves toward using specialized heterogeneous many-core processors to avoid the effects of the power wall,software developers are finding it hard to deal with the complexity of these systems.In this paper,we share our experience of developing a programming model and its supporting compiler and libraries for Matrix-3000,which is designed for next-generation exascale supercomputers but has a complex memory hierarchy and processor organization.To assist its software development,we have developed a software stack from scratch that includes a low-level programming interface and a high-level OpenCL compiler.Our low-level programming model offers native programming support for using the bare-metal accelerators of Matrix-3000,while the high-level model allows programmers to use the OpenCL programming standard.We detail our design choices and highlight the lessons learned from developing system software to enable the programming of bare-metal accelerators.Our programming models have been deployed in the production environment of an exascale prototype system.
文摘In this paper, we propose a multi-criteria machine-schedules decision making method that can be applied to a produc-tion environment involving several unrelated parallel machines and we will focus on three objectives: minimizing makespan, total flow time, and total number of tardy jobs. The decision making method consists of three phases. In the first phase, a mathematical model of a single machine scheduling problem, of which the objective is a weighted sum of the three objectives, is constructed. Such a model will be repeatedly solved by the CPLEX in the proposed Multi-Objective Simulated Annealing (MOSA) algorithm. In the second phase, the MOSA that integrates job clustering method, job group scheduling method, and job group – machine assignment method, is employed to obtain a set of non-dominated group schedules. During this phase, CPLEX software and the bipartite weighted matching algorithm are used repeatedly as parts of the MOSA algorithm. In the last phase, the technique of data envelopment analysis is applied to determine the most preferable schedule. A practical example is then presented in order to demonstrate the applicability of the proposed decision making method.