期刊文献+
共找到1,132篇文章
< 1 2 57 >
每页显示 20 50 100
Optimization Techniques for GPU-Based Parallel Programming Models in High-Performance
1
作者 Shuntao Tang Wei Chen 《信息工程期刊(中英文版)》 2024年第1期7-11,共5页
This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from g... This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from graphic-centric processors to versatile computing units,it delves into the nuanced optimization of memory access,thread management,algorithmic design,and data structures.These optimizations are critical for exploiting the parallel processing capabilities of GPUs,addressingboth the theoretical frameworks and practical implementations.By integrating advanced strategies such as memory coalescing,dynamic scheduling,and parallel algorithmic transformations,this research aims to significantly elevate computational efficiency and throughput.The findings underscore the potential of optimized GPU programming to revolutionize computational tasks across various domains,highlighting a pathway towards achieving unparalleled processing power and efficiency in HPC environments.The paper not only contributes to the academic discourse on GPU optimization but also provides actionable insights for developers,fostering advancements in computational sciences and technology. 展开更多
关键词 Optimization Techniques GPU-Based parallel programming Models High-Performance Computing
下载PDF
A Trace-state Based Approach to Specification and Design of Parallel Programs
2
作者 He Jifeng Oxford University Computing LaboratoryProgramming Research Group Parks Road, Oxford OXl 3QD, England 《计算机工程》 CAS CSCD 北大核心 1996年第S1期91-105,共15页
In this paper they deal with the issue of specification and design of parallel communicatingprocesses. A trace-state based model is introduced to describe the behaviour of concurrent programs. They presenta formal sys... In this paper they deal with the issue of specification and design of parallel communicatingprocesses. A trace-state based model is introduced to describe the behaviour of concurrent programs. They presenta formal system based on that model to achieve hierarchical and modular development and verification methods. Anumber of refinement rules are used to decompose the specification into smaller ones and calculate program fromthe 展开更多
关键词 COMM A Trace-state Based Approach to Specification and Design of parallel programs
下载PDF
Parallel Image Processing: Taking Grayscale Conversion Using OpenMP as an Example
3
作者 Bayan AlHumaidan Shahad Alghofaily +2 位作者 Maitha Al Qhahtani Sara Oudah Naya Nagy 《Journal of Computer and Communications》 2024年第2期1-10,共10页
In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularl... In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularly noteworthy in the field of image processing, which witnessed significant advancements. This parallel computing project explored the field of parallel image processing, with a focus on the grayscale conversion of colorful images. Our approach involved integrating OpenMP into our framework for parallelization to execute a critical image processing task: grayscale conversion. By using OpenMP, we strategically enhanced the overall performance of the conversion process by distributing the workload across multiple threads. The primary objectives of our project revolved around optimizing computation time and improving overall efficiency, particularly in the task of grayscale conversion of colorful images. Utilizing OpenMP for concurrent processing across multiple cores significantly reduced execution times through the effective distribution of tasks among these cores. The speedup values for various image sizes highlighted the efficacy of parallel processing, especially for large images. However, a detailed examination revealed a potential decline in parallelization efficiency with an increasing number of cores. This underscored the importance of a carefully optimized parallelization strategy, considering factors like load balancing and minimizing communication overhead. Despite challenges, the overall scalability and efficiency achieved with parallel image processing underscored OpenMP’s effectiveness in accelerating image manipulation tasks. 展开更多
关键词 parallel Computing Image Processing OPENMP parallel programming High Performance Computing GPU (Graphic Processing Unit)
下载PDF
Approach of generating parallel programs from parallelized algorithm design strategies 被引量:4
4
作者 WAN Jian-yi LI Xiao-ying 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2008年第3期128-132,共5页
Today, parallel programming is dominated by message passing libraries, such as message passing interface (MPI). This article intends to simplify parallel programming by generating parallel programs from parallelized... Today, parallel programming is dominated by message passing libraries, such as message passing interface (MPI). This article intends to simplify parallel programming by generating parallel programs from parallelized algorithm design strategies. It uses skeletons to abstract parallelized algorithm design strategies, as well as parallel architectures. Starting from problem specification, an abstract parallel abstract programming language+ (Apla+) program is generated from parallelized algorithm design strategies and problem-specific function definitions. By combining with parallel architectures, implicity of parallelism inside the parallelized algorithm design strategies is exploited. With implementation and transformation, C++ and parallel virtual machine (CPPVM) parallel program is finally generated. Parallelized branch and bound (B&B) algorithm design strategy and paraUelized divide and conquer (D & C) algorithm design strategy are studied in this article as examples. And it also illustrates the approach with a case study. 展开更多
关键词 parallel programming SKELETONS algorithm design strategy parallel architecture
原文传递
On the Problem of Optimizing Parallel Programs for Complex Memory Hierarchies
5
作者 金国华 陈福接 《Journal of Computer Science & Technology》 SCIE EI CSCD 1994年第1期1-26,共26页
Based on a thorough study of the relationship between array element accesses and loop indices of the nested loop, a method is presented with which the staggering relation and the compacting relation between the thread... Based on a thorough study of the relationship between array element accesses and loop indices of the nested loop, a method is presented with which the staggering relation and the compacting relation between the threads of the nested loop (either with a single linear function or with multiple linear functions) can be determined at compile-time,and accordingly the nested loop (either perfectly nested one or imperfectly nested one)can be restructured to avoid the thrashing problem. Due to its simplicity, our method can be efficiently implemented in any parallel compiler, and the improvement of the performance is significant as shown by the experimental results. 展开更多
关键词 OPTIMIZATION parallel program complex memory hierarchies SRIS RSRIS compacted RSRIS
原文传递
User-level failure detection and auto-recovery of parallel programs in HPC systems
6
作者 Guozhen ZHANG Yi LIU +2 位作者 Hailong YANG Jun XU Depei QIAN 《Frontiers of Computer Science》 SCIE EI CSCD 2021年第6期31-42,共12页
As the mean-time-between-failures(MTBF)continues to decline with the increasing number of components on large-scale high performance computing(HPC)systems,program failures might occur during the execution period with ... As the mean-time-between-failures(MTBF)continues to decline with the increasing number of components on large-scale high performance computing(HPC)systems,program failures might occur during the execution period with high probability.Ensuring successful execution of the HPC programs has become an issue that the unprivileged users should be concerned.From the user perspective,if the program failure cannot be detected and handled in time,it would waste resources and delay the progress of program execution.Unfortunately,the unprivileged users are unable to perform program state checking due to execution control by the job management system as well as the limited privilege.Currently,automated tools for supporting user-level failure detection and autorecovery of parallel programs in HPC systems are missing.This paper proposes an innovative method for the unprivileged user to achieve failure detection of job execution and automatic resubmission of failed jobs.The state checker in our method is encapsulated as an independent job to reduce interference with the user jobs.In addition,we propose a dual-checker mechanism to improve the robustness of our approach.We implement the proposed method as a tool named automatic re-launcher(ARL)and evaluate it on the Tianhe-2 system.Experiment results show that ARL can detect the execution failures effectively on Tianhe-2 system.In addition,the communication and performance overhead caused by ARL is negligible.The good scalability of ARL makes it applicable for large-scale HPC systems. 展开更多
关键词 high performance computing parallel program failure detection failure auto-recovery
原文传递
PDP: Parallel Dynamic Programming 被引量:15
7
作者 Fei-Yue Wang Jie Zhang +2 位作者 Qinglai Wei Xinhu Zheng Li Li 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2017年第1期1-5,共5页
Deep reinforcement learning is a focus research area in artificial intelligence. The principle of optimality in dynamic programming is a key to the success of reinforcement learning methods. The principle of adaptive ... Deep reinforcement learning is a focus research area in artificial intelligence. The principle of optimality in dynamic programming is a key to the success of reinforcement learning methods. The principle of adaptive dynamic programming(ADP)is first presented instead of direct dynamic programming(DP),and the inherent relationship between ADP and deep reinforcement learning is developed. Next, analytics intelligence, as the necessary requirement, for the real reinforcement learning, is discussed. Finally, the principle of the parallel dynamic programming, which integrates dynamic programming and analytics intelligence, is presented as the future computational intelligence. 展开更多
关键词 parallel dynamic programming Dynamic programming Adaptive dynamic programming Reinforcement learning Deep learning Neural networks Artificial intelligence
下载PDF
Programming for scientific computing on peta-scale heterogeneous parallel systems 被引量:1
8
作者 杨灿群 吴强 +2 位作者 唐滔 王锋 薛京灵 《Journal of Central South University》 SCIE EI CAS 2013年第5期1189-1203,共15页
Peta-scale high-performance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to co... Peta-scale high-performance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to conduct computational experiments of historic significance, these systems are presently difficult to program. The users, who are domain experts rather than computer experts, prefer to use programming models closer to their domains (e.g., physics and biology) rather than MPI and OpenMP. This has led the development of domain-specific programming that provides domain-specific programming interfaces but abstracts away some performance-critical architecture details. Based on experience in designing large-scale computing systems, a hybrid programming framework for scientific computing on heterogeneous architectures is proposed in this work. Its design philosophy is to provide a collaborative mechanism for domain experts and computer experts so that both domain-specific knowledge and performance-critical architecture details can be adequately exploited. Two real-world scientific applications have been evaluated on TH-1A, a peta-scale CPU-GPU heterogeneous system that is currently the 5th fastest supercomputer in the world. The experimental results show that the proposed framework is well suited for developing large-scale scientific computing applications on peta-scale heterogeneous CPU/GPU systems. 展开更多
关键词 计算系统 科学应用 异构系统 PETA 编程模型 并行系统 超级计算机 领域专家
下载PDF
Grid Service Framework: Supporting Multi-Models Parallel Grid Programming
9
作者 邓倩妮 陆鑫达 《Journal of Shanghai Jiaotong university(Science)》 EI 2004年第1期56-59,共4页
Web service is a grid computing technology that promises greater ease-of-use and interoperability than previous distributed computing technologies. This paper proposed Group Service Framework, a grid computing platfor... Web service is a grid computing technology that promises greater ease-of-use and interoperability than previous distributed computing technologies. This paper proposed Group Service Framework, a grid computing platform based on Microsoft. NET that use web service to: (1) locate and harness volunteer computing resources for different applications, and (2) support multi-models such as Master/Slave, Divide and Conquer, Phase Parallel and so forth parallel programming paradigms in Grid environment, (3) allocate data and balance load dynamically and transparently for grid computing application. The Grid Service Framework based on Microsoft. NET was used to implement several simple parallel computing applications. The results show that the proposed Group Service Framework is suitable for generic parallel numerical computing. 展开更多
关键词 WEB服务器 计算机网络 群服务器结构 数据划分 平行数值计算
下载PDF
PARALLEL MULTIPLICATIVE ITERATIVE METHODS FOR CONVEX PROGRAMMING
10
作者 陈忠 费浦生 《Acta Mathematica Scientia》 SCIE CSCD 1997年第2期205-210,共6页
In this paper, we present two parallel multiplicative algorithms for convex programming. If the objective function has compact level sets and has a locally Lipschitz continuous gradient, we discuss convergence of the ... In this paper, we present two parallel multiplicative algorithms for convex programming. If the objective function has compact level sets and has a locally Lipschitz continuous gradient, we discuss convergence of the algorithms. The proofs are essentially based on the results of sequential methods shown by Eggermontt[1]. 展开更多
关键词 parallel algorithm convex programming
下载PDF
Scheduling Step-Deteriorating Jobs on Parallel Machines by Mixed Integer Programming 被引量:4
11
作者 郭鹏 程文明 +1 位作者 曾鸣 梁剑 《Journal of Donghua University(English Edition)》 EI CAS 2015年第5期709-714,719,共7页
Production scheduling has a major impact on the productivity of the manufacturing process. Recently, scheduling problems with deteriorating jobs have attracted increasing attentions from researchers. In many practical... Production scheduling has a major impact on the productivity of the manufacturing process. Recently, scheduling problems with deteriorating jobs have attracted increasing attentions from researchers. In many practical situations,it is found that some jobs fail to be processed prior to the pre-specified thresholds,and they often consume extra deteriorating time for successful accomplishment. Their processing times can be characterized by a step-wise function. Such kinds of jobs are called step-deteriorating jobs. In this paper,parallel machine scheduling problem with stepdeteriorating jobs( PMSD) is considered. Due to its intractability,four different mixed integer programming( MIP) models are formulated for solving the problem under consideration. The study aims to investigate the performance of these models and find promising optimization formulation to solve the largest possible problem instances. The proposed four models are solved by commercial software CPLEX. Moreover,the near-optimal solutions can be obtained by black-box local-search solver LocalS olver with the fourth one. The computational results show that the efficiencies of different MIP models depend on the distribution intervals of deteriorating thresholds, and the performance of LocalS olver is clearly better than that of CPLEX in terms of the quality of the solutions and the computational time. 展开更多
关键词 parallel machine step-deterioration mixed integer programming(MIP) scheduling models total completion time
下载PDF
Stochastic Programming Model for Discrete Lotsizing and Scheduling Problem on Parallel Machines
12
作者 Kensuke Ishiwata Jun Imaizumi +1 位作者 Takayuki Shiina Susumu Morito 《American Journal of Operations Research》 2012年第3期374-381,共8页
In recent years, it has been difficult for manufactures and suppliers to forecast demand from a market for a given product precisely. Therefore, it has become important for them to cope with fluctuations in demand. Fr... In recent years, it has been difficult for manufactures and suppliers to forecast demand from a market for a given product precisely. Therefore, it has become important for them to cope with fluctuations in demand. From this viewpoint, the problem of planning or scheduling in production systems can be regarded as a mathematical problem with stochastic elements. However, in many previous studies, such problems are formulated without stochastic factors, treating stochastic elements as deterministic variables or parameters. Stochastic programming incorporates such factors into the mathematical formulation. In the present paper, we consider a multi-product, discrete, lotsizing and scheduling problem on parallel machines with stochastic demands. Under certain assumptions, this problem can be formulated as a stochastic integer programming problem. We attempt to solve this problem by a scenario aggregation method proposed by Rockafellar and Wets. The results from computational experiments suggest that our approach is able to solve large-scale problems, and that, under the condition of uncertainty, incorporating stochastic elements into the model gives better results than formulating the problem as a deterministic model. 展开更多
关键词 STOCHASTIC programMING Lotsizing and Scheduling parallel MACHINES SCENARIO AGGREGATION Method
下载PDF
Optimal Redundancy Allocation in Hierarchical Series-Parallel Systems Using Mixed Integer Programming
13
作者 Mohsen Ziaee 《Applied Mathematics》 2013年第1期79-83,共5页
Reliability optimization plays an important role in design, operation and management of the industrial systems. System reliability can be easily enhanced by improving the reliability of unreliable components and/or by... Reliability optimization plays an important role in design, operation and management of the industrial systems. System reliability can be easily enhanced by improving the reliability of unreliable components and/or by using redundant configuration with subsystems/components in parallel. Redundancy Allocation Problem (RAP) was studied in this research. A mixed integer programming model was proposed to solve the problem, which considers simultaneously two objectives under several resource constraints. The model is only for the hierarchical series-parallel systems in which the elements of any subset of subsystems or components are connected in series or parallel and constitute a larger subsystem or total system. At the end of the study, the performance of the proposed approach was evaluated by a numerical example. 展开更多
关键词 HIERARCHICAL SERIES-parallel System Optimal REDUNDANCY ALLOCATION Mixed INTEGER programMING Formulation Reliability Optimization
下载PDF
面向GPU并行编程的线程同步综述
14
作者 高岚 赵雨晨 +2 位作者 张伟功 王晶 钱德沛 《软件学报》 EI CSCD 北大核心 2024年第2期1028-1047,共20页
并行计算已成为主流趋势.在并行计算系统中,同步是关键设计之一,对硬件性能的充分利用至关重要.近年来,GPU(graphic processing unit,图形处理器)作为应用最为广加速器得到了快速发展,众多应用也对GPU线程同步提出更高要求.然而,现有GP... 并行计算已成为主流趋势.在并行计算系统中,同步是关键设计之一,对硬件性能的充分利用至关重要.近年来,GPU(graphic processing unit,图形处理器)作为应用最为广加速器得到了快速发展,众多应用也对GPU线程同步提出更高要求.然而,现有GPU系统却难以高效地支持真实应用中复杂的线程同步.研究者虽然提出了很多支持GPU线程同步的方法并取得了较大进展,但GPU独特的体系结构及并行模式导致GPU线程同步的研究仍然面临很多挑战.根据不同的线程同步目的和粒度对GPU并行编程中的线程同步进行分类.在此基础上,围绕GPU线程同步的表达和执行,首先分析总结GPU线程同步存在的难以高效表达、错误频发、执行效率低的关键问题及挑战;而后依据不同的GPU线程同步粒度,从线程同步表达方法和性能优化方法两个方面入手,介绍近年来学术界和产业界对GPU线程竞争同步及合作同步的研究,对现有研究方法进行分析与总结.最后,指出GPU线程同步未来的研究趋势和发展前景,并给出可能的研究思路,从而为该领域的研究人员提供参考. 展开更多
关键词 通用图形处理器(GPGPU) 并行编程 线程同步 性能优化
下载PDF
一种面向舰船结构毁伤的大变形流固耦合数值计算方法
15
作者 王杰 王景焘 +3 位作者 黄超 伍洋 刘娜 张磐 《计算力学学报》 CAS CSCD 北大核心 2024年第2期335-343,共9页
水下爆炸导致舰船结构毁伤是一个复杂的非线性大变形流固耦合过程,高精度的流固耦合计算是获得高置信模拟结果的关键。基于浸没边界思想,本文提出一种面向大变形壳理论的流固耦合数值方法,可精确刻画流固耦合界面并高效求解流固界面约... 水下爆炸导致舰船结构毁伤是一个复杂的非线性大变形流固耦合过程,高精度的流固耦合计算是获得高置信模拟结果的关键。基于浸没边界思想,本文提出一种面向大变形壳理论的流固耦合数值方法,可精确刻画流固耦合界面并高效求解流固界面约束方程。基于该方法,本文提出了完整的适用于水下爆炸舰船结构毁伤的大变形流固耦合数值计算方案,并基于大规模并行编程框架,研发形成适用于舰船结构毁伤的流固耦合大规模并行计算软件。与泰勒平板理论解和水下爆炸结构冲击响应实验数据等进行对比表明,本文方法可有效模拟大变形流固耦合工程问题,具备较高数值求解精度。在此基础上,完成了水下爆炸整船结构毁伤过程大规模数值模拟。该方法可有效应用于舰船毁伤等级评估,应用前景广阔。 展开更多
关键词 大变形流固耦合 浸没边界法 舰船结构毁伤 并行编程框架
下载PDF
面向国产异构众核系统的Parallel C语言设计与实现 被引量:10
16
作者 何王全 刘勇 +2 位作者 方燕飞 魏迪 漆锋滨 《软件学报》 EI CSCD 北大核心 2017年第4期764-785,共22页
异构众核架构具有超高的性能功耗比,已成为超级计算机体系结构的重要发展方向.但众核系统更为复杂的并行层次和存储层次,给编程和优化带来了极大的挑战.因此,研究面向众核系统的并行编程技术,对于降低国产众核系统并行应用的编程难度、... 异构众核架构具有超高的性能功耗比,已成为超级计算机体系结构的重要发展方向.但众核系统更为复杂的并行层次和存储层次,给编程和优化带来了极大的挑战.因此,研究面向众核系统的并行编程技术,对于降低国产众核系统并行应用的编程难度、提升并行程序的性能都具有重要的意义.提出统一架构的多模式并行编程模型,包括异构融合的加速运算模型和按同构方式编程的自主运算模型,根据编程模型设计了Parallel C语言,能够有效地描述国产众核系统的异构并行性.与其他众核系统上MPI+X的使用模式相比,编程和系统优化都具有全局视角,在多级局部性描述、单边消息、兼容已有多核应用等方面具有特色;基于Open64构建了Parallel C编译系统,全面支持加速运算模型和自主运算模型,提出并实现了数据布局与自动DMA、编译指导的线程代理和拓扑位置感知的集合通信等优化.Micro Benchmark和实际应用在神威太湖之光计算机系统上的测试数据结果表明:Parallel C语言和编译系统具有良好的性能和可扩展性,能够有效支撑大型应用. 展开更多
关键词 异构众核 编程模型 并行语言 parallel C 编译器 消息传递
下载PDF
平行流交叉口车道控制与信号配时组合优化
17
作者 宋浪 王健 +1 位作者 杨璐 安实 《浙江大学学报(工学版)》 EI CAS CSCD 北大核心 2024年第8期1647-1658,共12页
为了提升平行流交叉口实际应用的灵活性,提出车道控制与信号配时组合优化方法,将单向、非对称双向、对称双向、三向、四向设置与布设方向组合共16种方案整合到优化模型中,通过修正交通冲突矩阵自动生成相位相序方案.构建混合整数线性规... 为了提升平行流交叉口实际应用的灵活性,提出车道控制与信号配时组合优化方法,将单向、非对称双向、对称双向、三向、四向设置与布设方向组合共16种方案整合到优化模型中,通过修正交通冲突矩阵自动生成相位相序方案.构建混合整数线性规划模型,实现交叉口设置方案选择、车道分配和信号配时的组合优化.结果表明,在各种流量场景下,对称双向、三向、四向设置方案相较于常规交叉口分别能够提升约20%、20%、50%的通行能力,单向、非对称双向设置方案通行能力与常规交叉口接近,说明平行流交叉口不宜采用单向、非对称双向设置.四向设置方案通行能力的提升幅度最大,最大值能达到70.51%.对称双向和三向设置方案的通行能力提升相差不大,但三向设置在不对称流量场景中的表现优于对称双向设置. 展开更多
关键词 交通工程 控制方法 混合整数线性规划 平行流交叉口 移位左转
下载PDF
超大规模数据处理中并行计算技术的应用研究
18
作者 杨多海 《科技创新与应用》 2024年第17期181-184,共4页
随着人工智能和大数据时代的到来,超大规模数据处理成了一个重要的研究领域。该文主要探讨并行计算技术在超大规模数据处理中的应用,首先详细阐述并行计算和超大规模数据处理的基本理论与概念,特别是并行计算的编程模型与工具,最后通过... 随着人工智能和大数据时代的到来,超大规模数据处理成了一个重要的研究领域。该文主要探讨并行计算技术在超大规模数据处理中的应用,首先详细阐述并行计算和超大规模数据处理的基本理论与概念,特别是并行计算的编程模型与工具,最后通过分析并行计算在搜索引擎、气象预报和金融分析等中的实际案例,阐述并行计算技术在超大规模数据处理中的实际应用。 展开更多
关键词 并行计算技术 超大规模数据处理 编程模型与工具 实际案例 具体应用
下载PDF
基于OpenMP的堆芯中子学软件性能优化研究
19
作者 刘婷 安萍 +1 位作者 芦韡 秦志红 《中国核电》 2024年第2期190-196,共7页
CORCA-3D软件是中国核动力研究设计院自主研发的先进节块法堆芯三维少群中子学计算软件,提升CORCA-3D软件运行速度可以提高反应堆系统分析的效率。目前CORCA-3D软件采用单线程的方式运行,并没有充分利用计算机的多核硬件资源,对CORCA-3... CORCA-3D软件是中国核动力研究设计院自主研发的先进节块法堆芯三维少群中子学计算软件,提升CORCA-3D软件运行速度可以提高反应堆系统分析的效率。目前CORCA-3D软件采用单线程的方式运行,并没有充分利用计算机的多核硬件资源,对CORCA-3D软件进行性能分析,发现其存在运行时间较长的热点函数,CPU利用率较低,因此可引入并行编程技术来加速CORCA-3D软件的计算。文中将OpenMP编程技术运用到CORCA-3D软件中,并介绍了CORCA-3D软件并行优化的设计与实现。通过对方家山1号机组全堆芯进行测试,证明并行编程技术可大幅提升CORCA-3D软件的运行效率,平均加速比约为2左右。此并行编程技术的运用为后续堆芯数值软件应用提供技术支撑。 展开更多
关键词 堆芯中子学 并行编程 OPENMP 运行效率
下载PDF
任务并行编程模型下排列熵算法的并行实现
20
作者 李维权 《软件工程》 2024年第2期40-43,共4页
排列熵算法随着嵌入维数的增大,运算规模将会呈平方级数增大,计算时效性问题突出,亟待解决。为此,提出一种基于任务并行编程模型的线程级并行方法,通过任务并行运行系统(StarPU)将密集型计算划分为多个独立的任务,再由调度器将任务调度... 排列熵算法随着嵌入维数的增大,运算规模将会呈平方级数增大,计算时效性问题突出,亟待解决。为此,提出一种基于任务并行编程模型的线程级并行方法,通过任务并行运行系统(StarPU)将密集型计算划分为多个独立的任务,再由调度器将任务调度到不同的CPU上执行,实现排列熵算法的并行化。基于StarPU的排列熵并行算法与串行程序相比较,加速比为23.79倍,相较于OpenMP(一种用于共享内存并行系统的并行计算方案),在分配28个线程时,加速比为1.17倍,结果表明该方法能够有效实现排列熵算法的加速执行。 展开更多
关键词 排列熵算法 任务并行编程模型 OPENMP StarPU
下载PDF
上一页 1 2 57 下一页 到第
使用帮助 返回顶部