This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from g...This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from graphic-centric processors to versatile computing units,it delves into the nuanced optimization of memory access,thread management,algorithmic design,and data structures.These optimizations are critical for exploiting the parallel processing capabilities of GPUs,addressingboth the theoretical frameworks and practical implementations.By integrating advanced strategies such as memory coalescing,dynamic scheduling,and parallel algorithmic transformations,this research aims to significantly elevate computational efficiency and throughput.The findings underscore the potential of optimized GPU programming to revolutionize computational tasks across various domains,highlighting a pathway towards achieving unparalleled processing power and efficiency in HPC environments.The paper not only contributes to the academic discourse on GPU optimization but also provides actionable insights for developers,fostering advancements in computational sciences and technology.展开更多
In this paper they deal with the issue of specification and design of parallel communicatingprocesses. A trace-state based model is introduced to describe the behaviour of concurrent programs. They presenta formal sys...In this paper they deal with the issue of specification and design of parallel communicatingprocesses. A trace-state based model is introduced to describe the behaviour of concurrent programs. They presenta formal system based on that model to achieve hierarchical and modular development and verification methods. Anumber of refinement rules are used to decompose the specification into smaller ones and calculate program fromthe展开更多
In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularl...In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularly noteworthy in the field of image processing, which witnessed significant advancements. This parallel computing project explored the field of parallel image processing, with a focus on the grayscale conversion of colorful images. Our approach involved integrating OpenMP into our framework for parallelization to execute a critical image processing task: grayscale conversion. By using OpenMP, we strategically enhanced the overall performance of the conversion process by distributing the workload across multiple threads. The primary objectives of our project revolved around optimizing computation time and improving overall efficiency, particularly in the task of grayscale conversion of colorful images. Utilizing OpenMP for concurrent processing across multiple cores significantly reduced execution times through the effective distribution of tasks among these cores. The speedup values for various image sizes highlighted the efficacy of parallel processing, especially for large images. However, a detailed examination revealed a potential decline in parallelization efficiency with an increasing number of cores. This underscored the importance of a carefully optimized parallelization strategy, considering factors like load balancing and minimizing communication overhead. Despite challenges, the overall scalability and efficiency achieved with parallel image processing underscored OpenMP’s effectiveness in accelerating image manipulation tasks.展开更多
Today, parallel programming is dominated by message passing libraries, such as message passing interface (MPI). This article intends to simplify parallel programming by generating parallel programs from parallelized...Today, parallel programming is dominated by message passing libraries, such as message passing interface (MPI). This article intends to simplify parallel programming by generating parallel programs from parallelized algorithm design strategies. It uses skeletons to abstract parallelized algorithm design strategies, as well as parallel architectures. Starting from problem specification, an abstract parallel abstract programming language+ (Apla+) program is generated from parallelized algorithm design strategies and problem-specific function definitions. By combining with parallel architectures, implicity of parallelism inside the parallelized algorithm design strategies is exploited. With implementation and transformation, C++ and parallel virtual machine (CPPVM) parallel program is finally generated. Parallelized branch and bound (B&B) algorithm design strategy and paraUelized divide and conquer (D & C) algorithm design strategy are studied in this article as examples. And it also illustrates the approach with a case study.展开更多
Based on a thorough study of the relationship between array element accesses and loop indices of the nested loop, a method is presented with which the staggering relation and the compacting relation between the thread...Based on a thorough study of the relationship between array element accesses and loop indices of the nested loop, a method is presented with which the staggering relation and the compacting relation between the threads of the nested loop (either with a single linear function or with multiple linear functions) can be determined at compile-time,and accordingly the nested loop (either perfectly nested one or imperfectly nested one)can be restructured to avoid the thrashing problem. Due to its simplicity, our method can be efficiently implemented in any parallel compiler, and the improvement of the performance is significant as shown by the experimental results.展开更多
As the mean-time-between-failures(MTBF)continues to decline with the increasing number of components on large-scale high performance computing(HPC)systems,program failures might occur during the execution period with ...As the mean-time-between-failures(MTBF)continues to decline with the increasing number of components on large-scale high performance computing(HPC)systems,program failures might occur during the execution period with high probability.Ensuring successful execution of the HPC programs has become an issue that the unprivileged users should be concerned.From the user perspective,if the program failure cannot be detected and handled in time,it would waste resources and delay the progress of program execution.Unfortunately,the unprivileged users are unable to perform program state checking due to execution control by the job management system as well as the limited privilege.Currently,automated tools for supporting user-level failure detection and autorecovery of parallel programs in HPC systems are missing.This paper proposes an innovative method for the unprivileged user to achieve failure detection of job execution and automatic resubmission of failed jobs.The state checker in our method is encapsulated as an independent job to reduce interference with the user jobs.In addition,we propose a dual-checker mechanism to improve the robustness of our approach.We implement the proposed method as a tool named automatic re-launcher(ARL)and evaluate it on the Tianhe-2 system.Experiment results show that ARL can detect the execution failures effectively on Tianhe-2 system.In addition,the communication and performance overhead caused by ARL is negligible.The good scalability of ARL makes it applicable for large-scale HPC systems.展开更多
Web service is a grid computing technology that promises greater ease-of-use and interoperability than previous distributed computing technologies. This paper proposed Group Service Framework, a grid computing platfor...Web service is a grid computing technology that promises greater ease-of-use and interoperability than previous distributed computing technologies. This paper proposed Group Service Framework, a grid computing platform based on Microsoft. NET that use web service to: (1) locate and harness volunteer computing resources for different applications, and (2) support multi-models such as Master/Slave, Divide and Conquer, Phase Parallel and so forth parallel programming paradigms in Grid environment, (3) allocate data and balance load dynamically and transparently for grid computing application. The Grid Service Framework based on Microsoft. NET was used to implement several simple parallel computing applications. The results show that the proposed Group Service Framework is suitable for generic parallel numerical computing.展开更多
Production scheduling has a major impact on the productivity of the manufacturing process. Recently, scheduling problems with deteriorating jobs have attracted increasing attentions from researchers. In many practical...Production scheduling has a major impact on the productivity of the manufacturing process. Recently, scheduling problems with deteriorating jobs have attracted increasing attentions from researchers. In many practical situations,it is found that some jobs fail to be processed prior to the pre-specified thresholds,and they often consume extra deteriorating time for successful accomplishment. Their processing times can be characterized by a step-wise function. Such kinds of jobs are called step-deteriorating jobs. In this paper,parallel machine scheduling problem with stepdeteriorating jobs( PMSD) is considered. Due to its intractability,four different mixed integer programming( MIP) models are formulated for solving the problem under consideration. The study aims to investigate the performance of these models and find promising optimization formulation to solve the largest possible problem instances. The proposed four models are solved by commercial software CPLEX. Moreover,the near-optimal solutions can be obtained by black-box local-search solver LocalS olver with the fourth one. The computational results show that the efficiencies of different MIP models depend on the distribution intervals of deteriorating thresholds, and the performance of LocalS olver is clearly better than that of CPLEX in terms of the quality of the solutions and the computational time.展开更多
In this paper, we present two parallel multiplicative algorithms for convex programming. If the objective function has compact level sets and has a locally Lipschitz continuous gradient, we discuss convergence of the ...In this paper, we present two parallel multiplicative algorithms for convex programming. If the objective function has compact level sets and has a locally Lipschitz continuous gradient, we discuss convergence of the algorithms. The proofs are essentially based on the results of sequential methods shown by Eggermontt[1].展开更多
In recent years, it has been difficult for manufactures and suppliers to forecast demand from a market for a given product precisely. Therefore, it has become important for them to cope with fluctuations in demand. Fr...In recent years, it has been difficult for manufactures and suppliers to forecast demand from a market for a given product precisely. Therefore, it has become important for them to cope with fluctuations in demand. From this viewpoint, the problem of planning or scheduling in production systems can be regarded as a mathematical problem with stochastic elements. However, in many previous studies, such problems are formulated without stochastic factors, treating stochastic elements as deterministic variables or parameters. Stochastic programming incorporates such factors into the mathematical formulation. In the present paper, we consider a multi-product, discrete, lotsizing and scheduling problem on parallel machines with stochastic demands. Under certain assumptions, this problem can be formulated as a stochastic integer programming problem. We attempt to solve this problem by a scenario aggregation method proposed by Rockafellar and Wets. The results from computational experiments suggest that our approach is able to solve large-scale problems, and that, under the condition of uncertainty, incorporating stochastic elements into the model gives better results than formulating the problem as a deterministic model.展开更多
Reliability optimization plays an important role in design, operation and management of the industrial systems. System reliability can be easily enhanced by improving the reliability of unreliable components and/or by...Reliability optimization plays an important role in design, operation and management of the industrial systems. System reliability can be easily enhanced by improving the reliability of unreliable components and/or by using redundant configuration with subsystems/components in parallel. Redundancy Allocation Problem (RAP) was studied in this research. A mixed integer programming model was proposed to solve the problem, which considers simultaneously two objectives under several resource constraints. The model is only for the hierarchical series-parallel systems in which the elements of any subset of subsystems or components are connected in series or parallel and constitute a larger subsystem or total system. At the end of the study, the performance of the proposed approach was evaluated by a numerical example.展开更多
目的构建基于拓展平行过程理论(extended parallel process model,EPPM)的胃癌患者一级亲属胃癌筛查行为干预方案。方法2023年3—11月,在文献回顾和小组讨论的基础上形成基于EPPM理论的胃癌患者一级亲属胃癌筛查行为干预方案初稿,采用...目的构建基于拓展平行过程理论(extended parallel process model,EPPM)的胃癌患者一级亲属胃癌筛查行为干预方案。方法2023年3—11月,在文献回顾和小组讨论的基础上形成基于EPPM理论的胃癌患者一级亲属胃癌筛查行为干预方案初稿,采用德尔菲专家函询法对18名专家进行2轮函询,形成干预方案终稿。结果2轮专家函询共构建2个一级指标、4个二级指标和27个三级指标的筛查干预方案。专家权威系数分别为0.822、0.884;2轮函询的肯德尔协调系数分别为0.176、0.373,差异有统计学意义(P<0.001)。结论构建的基于EPPM理论的胃癌患者一级亲属胃癌筛查行为干预方案具有可靠性和实用性,以期为胃癌患者一级亲属胃癌筛查行为干预提供借鉴依据。展开更多
The workload of the 3D magnetotelluric forward modeling algorithm is so large that the traditional serial algorithm costs an extremely large compute time. However, the 3D forward modeling algorithm can process the dat...The workload of the 3D magnetotelluric forward modeling algorithm is so large that the traditional serial algorithm costs an extremely large compute time. However, the 3D forward modeling algorithm can process the data in the frequency domain, which is very suitable for parallel computation. With the advantage of MPI and based on an analysis of the flow of the 3D magnetotelluric serial forward algorithm, we suggest the idea of parallel computation and apply it. Three theoretical models are tested and the execution efficiency is compared in different situations. The results indicate that the parallel 3D forward modeling computation is correct and the efficiency is greatly improved. This method is suitable for large size geophysical computations.展开更多
文摘This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from graphic-centric processors to versatile computing units,it delves into the nuanced optimization of memory access,thread management,algorithmic design,and data structures.These optimizations are critical for exploiting the parallel processing capabilities of GPUs,addressingboth the theoretical frameworks and practical implementations.By integrating advanced strategies such as memory coalescing,dynamic scheduling,and parallel algorithmic transformations,this research aims to significantly elevate computational efficiency and throughput.The findings underscore the potential of optimized GPU programming to revolutionize computational tasks across various domains,highlighting a pathway towards achieving unparalleled processing power and efficiency in HPC environments.The paper not only contributes to the academic discourse on GPU optimization but also provides actionable insights for developers,fostering advancements in computational sciences and technology.
基金ESPRIT Basic Research ProCoS project 3104 and 7071
文摘In this paper they deal with the issue of specification and design of parallel communicatingprocesses. A trace-state based model is introduced to describe the behaviour of concurrent programs. They presenta formal system based on that model to achieve hierarchical and modular development and verification methods. Anumber of refinement rules are used to decompose the specification into smaller ones and calculate program fromthe
文摘In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularly noteworthy in the field of image processing, which witnessed significant advancements. This parallel computing project explored the field of parallel image processing, with a focus on the grayscale conversion of colorful images. Our approach involved integrating OpenMP into our framework for parallelization to execute a critical image processing task: grayscale conversion. By using OpenMP, we strategically enhanced the overall performance of the conversion process by distributing the workload across multiple threads. The primary objectives of our project revolved around optimizing computation time and improving overall efficiency, particularly in the task of grayscale conversion of colorful images. Utilizing OpenMP for concurrent processing across multiple cores significantly reduced execution times through the effective distribution of tasks among these cores. The speedup values for various image sizes highlighted the efficacy of parallel processing, especially for large images. However, a detailed examination revealed a potential decline in parallelization efficiency with an increasing number of cores. This underscored the importance of a carefully optimized parallelization strategy, considering factors like load balancing and minimizing communication overhead. Despite challenges, the overall scalability and efficiency achieved with parallel image processing underscored OpenMP’s effectiveness in accelerating image manipulation tasks.
基金National Natural Science Foundation of China (60773054)National Basic Research Program of China (2003CCA02800)
文摘Today, parallel programming is dominated by message passing libraries, such as message passing interface (MPI). This article intends to simplify parallel programming by generating parallel programs from parallelized algorithm design strategies. It uses skeletons to abstract parallelized algorithm design strategies, as well as parallel architectures. Starting from problem specification, an abstract parallel abstract programming language+ (Apla+) program is generated from parallelized algorithm design strategies and problem-specific function definitions. By combining with parallel architectures, implicity of parallelism inside the parallelized algorithm design strategies is exploited. With implementation and transformation, C++ and parallel virtual machine (CPPVM) parallel program is finally generated. Parallelized branch and bound (B&B) algorithm design strategy and paraUelized divide and conquer (D & C) algorithm design strategy are studied in this article as examples. And it also illustrates the approach with a case study.
文摘Based on a thorough study of the relationship between array element accesses and loop indices of the nested loop, a method is presented with which the staggering relation and the compacting relation between the threads of the nested loop (either with a single linear function or with multiple linear functions) can be determined at compile-time,and accordingly the nested loop (either perfectly nested one or imperfectly nested one)can be restructured to avoid the thrashing problem. Due to its simplicity, our method can be efficiently implemented in any parallel compiler, and the improvement of the performance is significant as shown by the experimental results.
基金This work was supported by National Key R&D Program of China(2020YFB150001)the National Natural Science Foundation of China(Grant No.62072018).
文摘As the mean-time-between-failures(MTBF)continues to decline with the increasing number of components on large-scale high performance computing(HPC)systems,program failures might occur during the execution period with high probability.Ensuring successful execution of the HPC programs has become an issue that the unprivileged users should be concerned.From the user perspective,if the program failure cannot be detected and handled in time,it would waste resources and delay the progress of program execution.Unfortunately,the unprivileged users are unable to perform program state checking due to execution control by the job management system as well as the limited privilege.Currently,automated tools for supporting user-level failure detection and autorecovery of parallel programs in HPC systems are missing.This paper proposes an innovative method for the unprivileged user to achieve failure detection of job execution and automatic resubmission of failed jobs.The state checker in our method is encapsulated as an independent job to reduce interference with the user jobs.In addition,we propose a dual-checker mechanism to improve the robustness of our approach.We implement the proposed method as a tool named automatic re-launcher(ARL)and evaluate it on the Tianhe-2 system.Experiment results show that ARL can detect the execution failures effectively on Tianhe-2 system.In addition,the communication and performance overhead caused by ARL is negligible.The good scalability of ARL makes it applicable for large-scale HPC systems.
基金National Natural F oundation of China(No.60 173 0 13 )
文摘Web service is a grid computing technology that promises greater ease-of-use and interoperability than previous distributed computing technologies. This paper proposed Group Service Framework, a grid computing platform based on Microsoft. NET that use web service to: (1) locate and harness volunteer computing resources for different applications, and (2) support multi-models such as Master/Slave, Divide and Conquer, Phase Parallel and so forth parallel programming paradigms in Grid environment, (3) allocate data and balance load dynamically and transparently for grid computing application. The Grid Service Framework based on Microsoft. NET was used to implement several simple parallel computing applications. The results show that the proposed Group Service Framework is suitable for generic parallel numerical computing.
基金National Natural Science Foundation of China(No.51405403)the Fundamental Research Funds for the Central Universities,China(No.2682014BR019)the Scientific Research Program of Education Bureau of Sichuan Province,China(No.12ZB322)
文摘Production scheduling has a major impact on the productivity of the manufacturing process. Recently, scheduling problems with deteriorating jobs have attracted increasing attentions from researchers. In many practical situations,it is found that some jobs fail to be processed prior to the pre-specified thresholds,and they often consume extra deteriorating time for successful accomplishment. Their processing times can be characterized by a step-wise function. Such kinds of jobs are called step-deteriorating jobs. In this paper,parallel machine scheduling problem with stepdeteriorating jobs( PMSD) is considered. Due to its intractability,four different mixed integer programming( MIP) models are formulated for solving the problem under consideration. The study aims to investigate the performance of these models and find promising optimization formulation to solve the largest possible problem instances. The proposed four models are solved by commercial software CPLEX. Moreover,the near-optimal solutions can be obtained by black-box local-search solver LocalS olver with the fourth one. The computational results show that the efficiencies of different MIP models depend on the distribution intervals of deteriorating thresholds, and the performance of LocalS olver is clearly better than that of CPLEX in terms of the quality of the solutions and the computational time.
文摘In this paper, we present two parallel multiplicative algorithms for convex programming. If the objective function has compact level sets and has a locally Lipschitz continuous gradient, we discuss convergence of the algorithms. The proofs are essentially based on the results of sequential methods shown by Eggermontt[1].
文摘In recent years, it has been difficult for manufactures and suppliers to forecast demand from a market for a given product precisely. Therefore, it has become important for them to cope with fluctuations in demand. From this viewpoint, the problem of planning or scheduling in production systems can be regarded as a mathematical problem with stochastic elements. However, in many previous studies, such problems are formulated without stochastic factors, treating stochastic elements as deterministic variables or parameters. Stochastic programming incorporates such factors into the mathematical formulation. In the present paper, we consider a multi-product, discrete, lotsizing and scheduling problem on parallel machines with stochastic demands. Under certain assumptions, this problem can be formulated as a stochastic integer programming problem. We attempt to solve this problem by a scenario aggregation method proposed by Rockafellar and Wets. The results from computational experiments suggest that our approach is able to solve large-scale problems, and that, under the condition of uncertainty, incorporating stochastic elements into the model gives better results than formulating the problem as a deterministic model.
文摘Reliability optimization plays an important role in design, operation and management of the industrial systems. System reliability can be easily enhanced by improving the reliability of unreliable components and/or by using redundant configuration with subsystems/components in parallel. Redundancy Allocation Problem (RAP) was studied in this research. A mixed integer programming model was proposed to solve the problem, which considers simultaneously two objectives under several resource constraints. The model is only for the hierarchical series-parallel systems in which the elements of any subset of subsystems or components are connected in series or parallel and constitute a larger subsystem or total system. At the end of the study, the performance of the proposed approach was evaluated by a numerical example.
文摘目的构建基于拓展平行过程理论(extended parallel process model,EPPM)的胃癌患者一级亲属胃癌筛查行为干预方案。方法2023年3—11月,在文献回顾和小组讨论的基础上形成基于EPPM理论的胃癌患者一级亲属胃癌筛查行为干预方案初稿,采用德尔菲专家函询法对18名专家进行2轮函询,形成干预方案终稿。结果2轮专家函询共构建2个一级指标、4个二级指标和27个三级指标的筛查干预方案。专家权威系数分别为0.822、0.884;2轮函询的肯德尔协调系数分别为0.176、0.373,差异有统计学意义(P<0.001)。结论构建的基于EPPM理论的胃癌患者一级亲属胃癌筛查行为干预方案具有可靠性和实用性,以期为胃癌患者一级亲属胃癌筛查行为干预提供借鉴依据。
基金This research is sponsored by the National Natural Science Foundation of China (No. 40374024).
文摘The workload of the 3D magnetotelluric forward modeling algorithm is so large that the traditional serial algorithm costs an extremely large compute time. However, the 3D forward modeling algorithm can process the data in the frequency domain, which is very suitable for parallel computation. With the advantage of MPI and based on an analysis of the flow of the 3D magnetotelluric serial forward algorithm, we suggest the idea of parallel computation and apply it. Three theoretical models are tested and the execution efficiency is compared in different situations. The results indicate that the parallel 3D forward modeling computation is correct and the efficiency is greatly improved. This method is suitable for large size geophysical computations.