期刊文献+
共找到1,133篇文章
< 1 2 57 >
每页显示 20 50 100
Optimization Techniques for GPU-Based Parallel Programming Models in High-Performance Computing
1
作者 Shuntao Tang Wei Chen 《信息工程期刊(中英文版)》 2024年第1期7-11,共5页
This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from g... This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from graphic-centric processors to versatile computing units,it delves into the nuanced optimization of memory access,thread management,algorithmic design,and data structures.These optimizations are critical for exploiting the parallel processing capabilities of GPUs,addressingboth the theoretical frameworks and practical implementations.By integrating advanced strategies such as memory coalescing,dynamic scheduling,and parallel algorithmic transformations,this research aims to significantly elevate computational efficiency and throughput.The findings underscore the potential of optimized GPU programming to revolutionize computational tasks across various domains,highlighting a pathway towards achieving unparalleled processing power and efficiency in HPC environments.The paper not only contributes to the academic discourse on GPU optimization but also provides actionable insights for developers,fostering advancements in computational sciences and technology. 展开更多
关键词 Optimization Techniques GPU-Based parallel programming Models High-Performance Computing
下载PDF
A Trace-state Based Approach to Specification and Design of Parallel Programs
2
作者 He Jifeng Oxford University Computing LaboratoryProgramming Research Group Parks Road, Oxford OXl 3QD, England 《计算机工程》 CAS CSCD 北大核心 1996年第S1期91-105,共15页
In this paper they deal with the issue of specification and design of parallel communicatingprocesses. A trace-state based model is introduced to describe the behaviour of concurrent programs. They presenta formal sys... In this paper they deal with the issue of specification and design of parallel communicatingprocesses. A trace-state based model is introduced to describe the behaviour of concurrent programs. They presenta formal system based on that model to achieve hierarchical and modular development and verification methods. Anumber of refinement rules are used to decompose the specification into smaller ones and calculate program fromthe 展开更多
关键词 COMM A Trace-state Based Approach to Specification and Design of parallel programs
下载PDF
Approach of generating parallel programs from parallelized algorithm design strategies 被引量:4
3
作者 WAN Jian-yi LI Xiao-ying 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2008年第3期128-132,共5页
Today, parallel programming is dominated by message passing libraries, such as message passing interface (MPI). This article intends to simplify parallel programming by generating parallel programs from parallelized... Today, parallel programming is dominated by message passing libraries, such as message passing interface (MPI). This article intends to simplify parallel programming by generating parallel programs from parallelized algorithm design strategies. It uses skeletons to abstract parallelized algorithm design strategies, as well as parallel architectures. Starting from problem specification, an abstract parallel abstract programming language+ (Apla+) program is generated from parallelized algorithm design strategies and problem-specific function definitions. By combining with parallel architectures, implicity of parallelism inside the parallelized algorithm design strategies is exploited. With implementation and transformation, C++ and parallel virtual machine (CPPVM) parallel program is finally generated. Parallelized branch and bound (B&B) algorithm design strategy and paraUelized divide and conquer (D & C) algorithm design strategy are studied in this article as examples. And it also illustrates the approach with a case study. 展开更多
关键词 parallel programming SKELETONS algorithm design strategy parallel architecture
原文传递
On the Problem of Optimizing Parallel Programs for Complex Memory Hierarchies
4
作者 金国华 陈福接 《Journal of Computer Science & Technology》 SCIE EI CSCD 1994年第1期1-26,共26页
Based on a thorough study of the relationship between array element accesses and loop indices of the nested loop, a method is presented with which the staggering relation and the compacting relation between the thread... Based on a thorough study of the relationship between array element accesses and loop indices of the nested loop, a method is presented with which the staggering relation and the compacting relation between the threads of the nested loop (either with a single linear function or with multiple linear functions) can be determined at compile-time,and accordingly the nested loop (either perfectly nested one or imperfectly nested one)can be restructured to avoid the thrashing problem. Due to its simplicity, our method can be efficiently implemented in any parallel compiler, and the improvement of the performance is significant as shown by the experimental results. 展开更多
关键词 OPTIMIZATION parallel program complex memory hierarchies SRIS RSRIS compacted RSRIS
原文传递
User-level failure detection and auto-recovery of parallel programs in HPC systems
5
作者 Guozhen ZHANG Yi LIU +2 位作者 Hailong YANG Jun XU Depei QIAN 《Frontiers of Computer Science》 SCIE EI CSCD 2021年第6期31-42,共12页
As the mean-time-between-failures(MTBF)continues to decline with the increasing number of components on large-scale high performance computing(HPC)systems,program failures might occur during the execution period with ... As the mean-time-between-failures(MTBF)continues to decline with the increasing number of components on large-scale high performance computing(HPC)systems,program failures might occur during the execution period with high probability.Ensuring successful execution of the HPC programs has become an issue that the unprivileged users should be concerned.From the user perspective,if the program failure cannot be detected and handled in time,it would waste resources and delay the progress of program execution.Unfortunately,the unprivileged users are unable to perform program state checking due to execution control by the job management system as well as the limited privilege.Currently,automated tools for supporting user-level failure detection and autorecovery of parallel programs in HPC systems are missing.This paper proposes an innovative method for the unprivileged user to achieve failure detection of job execution and automatic resubmission of failed jobs.The state checker in our method is encapsulated as an independent job to reduce interference with the user jobs.In addition,we propose a dual-checker mechanism to improve the robustness of our approach.We implement the proposed method as a tool named automatic re-launcher(ARL)and evaluate it on the Tianhe-2 system.Experiment results show that ARL can detect the execution failures effectively on Tianhe-2 system.In addition,the communication and performance overhead caused by ARL is negligible.The good scalability of ARL makes it applicable for large-scale HPC systems. 展开更多
关键词 high performance computing parallel program failure detection failure auto-recovery
原文传递
Grid Service Framework: Supporting Multi-Models Parallel Grid Programming
6
作者 邓倩妮 陆鑫达 《Journal of Shanghai Jiaotong university(Science)》 EI 2004年第1期56-59,共4页
Web service is a grid computing technology that promises greater ease-of-use and interoperability than previous distributed computing technologies. This paper proposed Group Service Framework, a grid computing platfor... Web service is a grid computing technology that promises greater ease-of-use and interoperability than previous distributed computing technologies. This paper proposed Group Service Framework, a grid computing platform based on Microsoft. NET that use web service to: (1) locate and harness volunteer computing resources for different applications, and (2) support multi-models such as Master/Slave, Divide and Conquer, Phase Parallel and so forth parallel programming paradigms in Grid environment, (3) allocate data and balance load dynamically and transparently for grid computing application. The Grid Service Framework based on Microsoft. NET was used to implement several simple parallel computing applications. The results show that the proposed Group Service Framework is suitable for generic parallel numerical computing. 展开更多
关键词 web service volunteer computing grid computing parallel programming garadigm
下载PDF
Parallel Image Processing: Taking Grayscale Conversion Using OpenMP as an Example
7
作者 Bayan AlHumaidan Shahad Alghofaily +2 位作者 Maitha Al Qhahtani Sara Oudah Naya Nagy 《Journal of Computer and Communications》 2024年第2期1-10,共10页
In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularl... In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularly noteworthy in the field of image processing, which witnessed significant advancements. This parallel computing project explored the field of parallel image processing, with a focus on the grayscale conversion of colorful images. Our approach involved integrating OpenMP into our framework for parallelization to execute a critical image processing task: grayscale conversion. By using OpenMP, we strategically enhanced the overall performance of the conversion process by distributing the workload across multiple threads. The primary objectives of our project revolved around optimizing computation time and improving overall efficiency, particularly in the task of grayscale conversion of colorful images. Utilizing OpenMP for concurrent processing across multiple cores significantly reduced execution times through the effective distribution of tasks among these cores. The speedup values for various image sizes highlighted the efficacy of parallel processing, especially for large images. However, a detailed examination revealed a potential decline in parallelization efficiency with an increasing number of cores. This underscored the importance of a carefully optimized parallelization strategy, considering factors like load balancing and minimizing communication overhead. Despite challenges, the overall scalability and efficiency achieved with parallel image processing underscored OpenMP’s effectiveness in accelerating image manipulation tasks. 展开更多
关键词 parallel Computing Image Processing OPENMP parallel programming High Performance Computing GPU (Graphic Processing Unit)
下载PDF
PDP: Parallel Dynamic Programming 被引量:15
8
作者 Fei-Yue Wang Jie Zhang +2 位作者 Qinglai Wei Xinhu Zheng Li Li 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2017年第1期1-5,共5页
Deep reinforcement learning is a focus research area in artificial intelligence. The principle of optimality in dynamic programming is a key to the success of reinforcement learning methods. The principle of adaptive ... Deep reinforcement learning is a focus research area in artificial intelligence. The principle of optimality in dynamic programming is a key to the success of reinforcement learning methods. The principle of adaptive dynamic programming ADP is first presented instead of direct dynamic programming DP , and the inherent relationship between ADP and deep reinforcement learning is developed. Next, analytics intelligence, as the necessary requirement, for the real reinforcement learning, is discussed. Finally, the principle of the parallel dynamic programming, which integrates dynamic programming and analytics intelligence, is presented as the future computational intelligence. © 2014 Chinese Association of Automation. 展开更多
关键词 Artificial intelligence Neural networks Reinforcement learning
下载PDF
PARALLEL MULTIPLICATIVE ITERATIVE METHODS FOR CONVEX PROGRAMMING
9
作者 陈忠 费浦生 《Acta Mathematica Scientia》 SCIE CSCD 1997年第2期205-210,共6页
In this paper, we present two parallel multiplicative algorithms for convex programming. If the objective function has compact level sets and has a locally Lipschitz continuous gradient, we discuss convergence of the ... In this paper, we present two parallel multiplicative algorithms for convex programming. If the objective function has compact level sets and has a locally Lipschitz continuous gradient, we discuss convergence of the algorithms. The proofs are essentially based on the results of sequential methods shown by Eggermontt[1]. 展开更多
关键词 parallel algorithm convex programming
下载PDF
Scheduling Step-Deteriorating Jobs on Parallel Machines by Mixed Integer Programming 被引量:4
10
作者 郭鹏 程文明 +1 位作者 曾鸣 梁剑 《Journal of Donghua University(English Edition)》 EI CAS 2015年第5期709-714,719,共7页
Production scheduling has a major impact on the productivity of the manufacturing process. Recently, scheduling problems with deteriorating jobs have attracted increasing attentions from researchers. In many practical... Production scheduling has a major impact on the productivity of the manufacturing process. Recently, scheduling problems with deteriorating jobs have attracted increasing attentions from researchers. In many practical situations,it is found that some jobs fail to be processed prior to the pre-specified thresholds,and they often consume extra deteriorating time for successful accomplishment. Their processing times can be characterized by a step-wise function. Such kinds of jobs are called step-deteriorating jobs. In this paper,parallel machine scheduling problem with stepdeteriorating jobs( PMSD) is considered. Due to its intractability,four different mixed integer programming( MIP) models are formulated for solving the problem under consideration. The study aims to investigate the performance of these models and find promising optimization formulation to solve the largest possible problem instances. The proposed four models are solved by commercial software CPLEX. Moreover,the near-optimal solutions can be obtained by black-box local-search solver LocalS olver with the fourth one. The computational results show that the efficiencies of different MIP models depend on the distribution intervals of deteriorating thresholds, and the performance of LocalS olver is clearly better than that of CPLEX in terms of the quality of the solutions and the computational time. 展开更多
关键词 parallel machine step-deterioration mixed integer programming(MIP) scheduling models total completion time
下载PDF
Stochastic Programming Model for Discrete Lotsizing and Scheduling Problem on Parallel Machines
11
作者 Kensuke Ishiwata Jun Imaizumi +1 位作者 Takayuki Shiina Susumu Morito 《American Journal of Operations Research》 2012年第3期374-381,共8页
In recent years, it has been difficult for manufactures and suppliers to forecast demand from a market for a given product precisely. Therefore, it has become important for them to cope with fluctuations in demand. Fr... In recent years, it has been difficult for manufactures and suppliers to forecast demand from a market for a given product precisely. Therefore, it has become important for them to cope with fluctuations in demand. From this viewpoint, the problem of planning or scheduling in production systems can be regarded as a mathematical problem with stochastic elements. However, in many previous studies, such problems are formulated without stochastic factors, treating stochastic elements as deterministic variables or parameters. Stochastic programming incorporates such factors into the mathematical formulation. In the present paper, we consider a multi-product, discrete, lotsizing and scheduling problem on parallel machines with stochastic demands. Under certain assumptions, this problem can be formulated as a stochastic integer programming problem. We attempt to solve this problem by a scenario aggregation method proposed by Rockafellar and Wets. The results from computational experiments suggest that our approach is able to solve large-scale problems, and that, under the condition of uncertainty, incorporating stochastic elements into the model gives better results than formulating the problem as a deterministic model. 展开更多
关键词 STOCHASTIC programMING Lotsizing and Scheduling parallel MACHINES SCENARIO AGGREGATION Method
下载PDF
Optimal Redundancy Allocation in Hierarchical Series-Parallel Systems Using Mixed Integer Programming
12
作者 Mohsen Ziaee 《Applied Mathematics》 2013年第1期79-83,共5页
Reliability optimization plays an important role in design, operation and management of the industrial systems. System reliability can be easily enhanced by improving the reliability of unreliable components and/or by... Reliability optimization plays an important role in design, operation and management of the industrial systems. System reliability can be easily enhanced by improving the reliability of unreliable components and/or by using redundant configuration with subsystems/components in parallel. Redundancy Allocation Problem (RAP) was studied in this research. A mixed integer programming model was proposed to solve the problem, which considers simultaneously two objectives under several resource constraints. The model is only for the hierarchical series-parallel systems in which the elements of any subset of subsystems or components are connected in series or parallel and constitute a larger subsystem or total system. At the end of the study, the performance of the proposed approach was evaluated by a numerical example. 展开更多
关键词 HIERARCHICAL SERIES-parallel System Optimal REDUNDANCY ALLOCATION Mixed INTEGER programming FORMULATION Reliability Optimization
下载PDF
The parallel 3D magnetotelluric forward modeling algorithm 被引量:28
13
作者 Tan Handong Tong Tuo Lin Changhong 《Applied Geophysics》 SCIE CSCD 2006年第4期197-202,共6页
The workload of the 3D magnetotelluric forward modeling algorithm is so large that the traditional serial algorithm costs an extremely large compute time. However, the 3D forward modeling algorithm can process the dat... The workload of the 3D magnetotelluric forward modeling algorithm is so large that the traditional serial algorithm costs an extremely large compute time. However, the 3D forward modeling algorithm can process the data in the frequency domain, which is very suitable for parallel computation. With the advantage of MPI and based on an analysis of the flow of the 3D magnetotelluric serial forward algorithm, we suggest the idea of parallel computation and apply it. Three theoretical models are tested and the execution efficiency is compared in different situations. The results indicate that the parallel 3D forward modeling computation is correct and the efficiency is greatly improved. This method is suitable for large size geophysical computations. 展开更多
关键词 Magnetotelluric 3D forward modeling MPI parallel programming design 3D staggered-grid finite difference method parallel algorithm.
下载PDF
面向国产异构众核系统的Parallel C语言设计与实现 被引量:10
14
作者 何王全 刘勇 +2 位作者 方燕飞 魏迪 漆锋滨 《软件学报》 EI CSCD 北大核心 2017年第4期764-785,共22页
异构众核架构具有超高的性能功耗比,已成为超级计算机体系结构的重要发展方向.但众核系统更为复杂的并行层次和存储层次,给编程和优化带来了极大的挑战.因此,研究面向众核系统的并行编程技术,对于降低国产众核系统并行应用的编程难度、... 异构众核架构具有超高的性能功耗比,已成为超级计算机体系结构的重要发展方向.但众核系统更为复杂的并行层次和存储层次,给编程和优化带来了极大的挑战.因此,研究面向众核系统的并行编程技术,对于降低国产众核系统并行应用的编程难度、提升并行程序的性能都具有重要的意义.提出统一架构的多模式并行编程模型,包括异构融合的加速运算模型和按同构方式编程的自主运算模型,根据编程模型设计了Parallel C语言,能够有效地描述国产众核系统的异构并行性.与其他众核系统上MPI+X的使用模式相比,编程和系统优化都具有全局视角,在多级局部性描述、单边消息、兼容已有多核应用等方面具有特色;基于Open64构建了Parallel C编译系统,全面支持加速运算模型和自主运算模型,提出并实现了数据布局与自动DMA、编译指导的线程代理和拓扑位置感知的集合通信等优化.Micro Benchmark和实际应用在神威太湖之光计算机系统上的测试数据结果表明:Parallel C语言和编译系统具有良好的性能和可扩展性,能够有效支撑大型应用. 展开更多
关键词 异构众核 编程模型 并行语言 parallel C 编译器 消息传递
下载PDF
An Approach to Parallelization of SIFT Algorithm on GPUs for Real-Time Applications 被引量:4
15
作者 Raghu Raj Prasanna Kumar Suresh Muknahallipatna John McInroy 《Journal of Computer and Communications》 2016年第17期18-50,共33页
Scale Invariant Feature Transform (SIFT) algorithm is a widely used computer vision algorithm that detects and extracts local feature descriptors from images. SIFT is computationally intensive, making it infeasible fo... Scale Invariant Feature Transform (SIFT) algorithm is a widely used computer vision algorithm that detects and extracts local feature descriptors from images. SIFT is computationally intensive, making it infeasible for single threaded im-plementation to extract local feature descriptors for high-resolution images in real time. In this paper, an approach to parallelization of the SIFT algorithm is demonstrated using NVIDIA’s Graphics Processing Unit (GPU). The parallel-ization design for SIFT on GPUs is divided into two stages, a) Algorithm de-sign-generic design strategies which focuses on data and b) Implementation de-sign-architecture specific design strategies which focuses on optimally using GPU resources for maximum occupancy. Increasing memory latency hiding, eliminating branches and data blocking achieve a significant decrease in aver-age computational time. Furthermore, it is observed via Paraver tools that our approach to parallelization while optimizing for maximum occupancy allows GPU to execute memory bound SIFT algorithm at optimal levels. 展开更多
关键词 Scale Invariant Feature Transform (SIFT) parallel Computing GPU GPU Occupancy Portable parallel programming CUDA
下载PDF
Asynchronous Nested Optimization Algorithms and Their Parallel Implementation
16
作者 Hans W. Moritsch, G.Ch. Pflug, M. Siomak Department of Statistics and Decision Support Systems,University of Vienna, Universitaetsstrasse 5 A\|1090 Vienna, Austria 《Wuhan University Journal of Natural Sciences》 CAS 2001年第Z1期560-567,共8页
Large scale optimization problems can only be solved in an efficient way, if their special structure is taken as the basis of algorithm design. In this paper we consider a very broad class of large-scale problems ... Large scale optimization problems can only be solved in an efficient way, if their special structure is taken as the basis of algorithm design. In this paper we consider a very broad class of large-scale problems with special structure, namely tree structured problems. We show how the exploitation of the structure leads to efficient decomposition algorithms and how it may be implemented in a parallel environment. 展开更多
关键词 financial management stochastic optimization tree structured problems parallel programming JAVA
下载PDF
Improvements in the score matrix calculation method using parallel score estimating algorithm
17
作者 Geraldo F.D.Zafalon Evandro A.Marucci +3 位作者 Julio C.Momente Jose R.A.Amazonas Liria M.Sato Jose M.Machado 《Journal of Biophysical Chemistry》 2013年第2期47-51,共5页
The increasing amount of sequences stored in genomic databases has become unfeasible to the sequential analysis. Then, the parallel computing brought its power to the Bioinformatics through parallel algorithms to alig... The increasing amount of sequences stored in genomic databases has become unfeasible to the sequential analysis. Then, the parallel computing brought its power to the Bioinformatics through parallel algorithms to align and analyze the sequences, providing improvements mainly in the running time of these algorithms. In many situations, the parallel strategy contributes to reducing the computational complexity of the big problems. This work shows some results obtained by an implementation of a parallel score estimating technique for the score matrix calculation stage, which is the first stage of a progressive multiple sequence alignment. The performance and quality of the parallel score estimating are compared with the results of a dynamic programming approach also implemented in parallel. This comparison shows a significant reduction of running time. Moreover, the quality of the final alignment, using the new strategy, is analyzed and compared with the quality of the approach with dynamic programming. 展开更多
关键词 ALGORITHMS Scoring Matrix parallel programming Alignment Quality
下载PDF
Parallel Dispatch:A New Paradigm of Electrical Power System Dispatch 被引量:5
18
作者 Jun Jason Zhang Fei-Yue Wang +5 位作者 Qiang Wang Dazhi Hao Xiaojing Yang David Wenzhong Gao Xiangyang Zhao Yingchen Zhang 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2018年第1期311-319,共9页
Modern power systems are evolving into sociotechnical systems with massive complexity, whose real-time operation and dispatch go beyond human capability. Thus,the need for developing and applying new intelligent power... Modern power systems are evolving into sociotechnical systems with massive complexity, whose real-time operation and dispatch go beyond human capability. Thus,the need for developing and applying new intelligent power system dispatch tools are of great practical significance. In this paper, we introduce the overall business model of power system dispatch, the top level design approach of an intelligent dispatch system, and the parallel intelligent technology with its dispatch applications. We expect that a new dispatch paradigm,namely the parallel dispatch, can be established by incorporating various intelligent technologies, especially the parallel intelligent technology, to enable secure operation of complex power grids,extend system operators' capabilities, suggest optimal dispatch strategies, and to provide decision-making recommendations according to power system operational goals. 展开更多
关键词 ACP knowledge automation power dispatch parallel dynamic programming parallel intelligence paralle learning situational awareness
下载PDF
Rapid Optimization of Tension Distribution for Cable-Driven Parallel Manipulators with Redundant Cables 被引量:8
19
作者 OUYANG Bo SHANG Weiwei 《Chinese Journal of Mechanical Engineering》 SCIE EI CAS CSCD 2016年第2期231-238,共8页
The solution of tension distributions is infinite for cable-driven parallel manipulators(CDPMs) with redundant cables. A rapid optimization method for determining the optimal tension distribution is presented. The n... The solution of tension distributions is infinite for cable-driven parallel manipulators(CDPMs) with redundant cables. A rapid optimization method for determining the optimal tension distribution is presented. The new optimization method is primarily based on the geometry properties of a polyhedron and convex analysis. The computational efficiency of the optimization method is improved by the designed projection algorithm, and a fast algorithm is proposed to determine which two of the lines are intersected at the optimal point. Moreover, a method for avoiding the operating point on the lower tension limit is developed. Simulation experiments are implemented on a six degree-of-freedom(6-DOF) CDPM with eight cables, and the results indicate that the new method is one order of magnitude faster than the standard simplex method. The optimal distribution of tension distribution is thus rapidly established on real-time by the proposed method. 展开更多
关键词 cable-driven parallel manipulator tension distribution redundant cable linear programming
下载PDF
Scheduling and Subcontracting under Parallel Machines 被引量:2
20
作者 陈荣军 唐国春 《Chinese Quarterly Journal of Mathematics》 CSCD 2012年第4期590-597,共8页
In this paper,we study a model on joint decisions of scheduling and subcontracting, in which jobs(orders) can be either processed by parallel machines at the manufacturer in-house or subcontracted to a subcontractor.T... In this paper,we study a model on joint decisions of scheduling and subcontracting, in which jobs(orders) can be either processed by parallel machines at the manufacturer in-house or subcontracted to a subcontractor.The manufacturer needs to determine which jobs should be produced in-house and which jobs should be subcontracted.Furthermore,it needs to determine a production schedule for jobs to be produced in-house.We discuss five classical scheduling objectives as production costs.For each problem with different objective functions,we give optimality conditions and propose dynamic programming algorithms. 展开更多
关键词 SCHEDULING SUBCONTRACTING dynamic programming parallel machines
下载PDF
上一页 1 2 57 下一页 到第
使用帮助 返回顶部