期刊文献+
共找到1,184篇文章
< 1 2 60 >
每页显示 20 50 100
Optimization Techniques for GPU-Based Parallel Programming Models in High-Performance
1
作者 Shuntao Tang Wei Chen 《信息工程期刊(中英文版)》 2024年第1期7-11,共5页
This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from g... This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from graphic-centric processors to versatile computing units,it delves into the nuanced optimization of memory access,thread management,algorithmic design,and data structures.These optimizations are critical for exploiting the parallel processing capabilities of GPUs,addressingboth the theoretical frameworks and practical implementations.By integrating advanced strategies such as memory coalescing,dynamic scheduling,and parallel algorithmic transformations,this research aims to significantly elevate computational efficiency and throughput.The findings underscore the potential of optimized GPU programming to revolutionize computational tasks across various domains,highlighting a pathway towards achieving unparalleled processing power and efficiency in HPC environments.The paper not only contributes to the academic discourse on GPU optimization but also provides actionable insights for developers,fostering advancements in computational sciences and technology. 展开更多
关键词 Optimization Techniques GPU-Based parallel programming Models High-Performance Computing
下载PDF
An MPI parallel DEM-IMB-LBM framework for simulating fluid-solid interaction problems 被引量:1
2
作者 Ming Xia Liuhong Deng +3 位作者 Fengqiang Gong Tongming Qu Y.T.Feng Jin Yu 《Journal of Rock Mechanics and Geotechnical Engineering》 SCIE CSCD 2024年第6期2219-2231,共13页
The high-resolution DEM-IMB-LBM model can accurately describe pore-scale fluid-solid interactions,but its potential for use in geotechnical engineering analysis has not been fully unleashed due to its prohibitive comp... The high-resolution DEM-IMB-LBM model can accurately describe pore-scale fluid-solid interactions,but its potential for use in geotechnical engineering analysis has not been fully unleashed due to its prohibitive computational costs.To overcome this limitation,a message passing interface(MPI)parallel DEM-IMB-LBM framework is proposed aimed at enhancing computation efficiency.This framework utilises a static domain decomposition scheme,with the entire computation domain being decomposed into multiple subdomains according to predefined processors.A detailed parallel strategy is employed for both contact detection and hydrodynamic force calculation.In particular,a particle ID re-numbering scheme is proposed to handle particle transitions across sub-domain interfaces.Two benchmarks are conducted to validate the accuracy and overall performance of the proposed framework.Subsequently,the framework is applied to simulate scenarios involving multi-particle sedimentation and submarine landslides.The numerical examples effectively demonstrate the robustness and applicability of the MPI parallel DEM-IMB-LBM framework. 展开更多
关键词 Discrete element method(DEM) Lattice Boltzmann method(LBM) Immersed moving boundary(IMB) multi-cores parallelization Message passing interface(MPI) CPU Submarine landslides
下载PDF
Shared Cache Based on Content Addressable Memory in a Multi-Core Architecture
3
作者 Allam Abumwais Mahmoud Obaid 《Computers, Materials & Continua》 SCIE EI 2023年第3期4951-4963,共13页
Modern shared-memory multi-core processors typically have shared Level 2(L2)or Level 3(L3)caches.Cache bottlenecks and replacement strategies are the main problems of such architectures,where multiple cores try to acc... Modern shared-memory multi-core processors typically have shared Level 2(L2)or Level 3(L3)caches.Cache bottlenecks and replacement strategies are the main problems of such architectures,where multiple cores try to access the shared cache simultaneously.The main problem in improving memory performance is the shared cache architecture and cache replacement.This paper documents the implementation of a Dual-Port Content Addressable Memory(DPCAM)and a modified Near-Far Access Replacement Algorithm(NFRA),which was previously proposed as a shared L2 cache layer in a multi-core processor.Standard Performance Evaluation Corporation(SPEC)Central Processing Unit(CPU)2006 benchmark workloads are used to evaluate the benefit of the shared L2 cache layer.Results show improved performance of the multicore processor’s DPCAM and NFRA algorithms,corresponding to a higher number of concurrent accesses to shared memory.The new architecture significantly increases system throughput and records performance improvements of up to 8.7%on various types of SPEC 2006 benchmarks.The miss rate is also improved by about 13%,with some exceptions in the sphinx3 and bzip2 benchmarks.These results could open a new window for solving the long-standing problems with shared cache in multi-core processors. 展开更多
关键词 multi-core processor shared cache content addressable memory dual port CAM replacement algorithm benchmark program
下载PDF
Performance Enhancement of XML Parsing Using Regression and Parallelism
4
作者 Muhammad Ali Minhaj Ahmad Khan 《Computer Systems Science & Engineering》 2024年第2期287-303,共17页
The Extensible Markup Language(XML)files,widely used for storing and exchanging information on the web require efficient parsing mechanisms to improve the performance of the applications.With the existing Document Obj... The Extensible Markup Language(XML)files,widely used for storing and exchanging information on the web require efficient parsing mechanisms to improve the performance of the applications.With the existing Document Object Model(DOM)based parsing,the performance degrades due to sequential processing and large memory requirements,thereby requiring an efficient XML parser to mitigate these issues.In this paper,we propose a Parallel XML Tree Generator(PXTG)algorithm for accelerating the parsing of XML files and a Regression-based XML Parsing Framework(RXPF)that analyzes and predicts performance through profiling,regression,and code generation for efficient parsing.The PXTG algorithm is based on dividing the XML file into n parts and producing n trees in parallel.The profiling phase of the RXPF framework produces a dataset by measuring the performance of various parsing models including StAX,SAX,DOM,JDOM,and PXTG on different cores by using multiple file sizes.The regression phase produces the prediction model,based on which the final code for efficient parsing of XML files is produced through the code generation phase.The RXPF framework has shown a significant improvement in performance varying from 9.54%to 32.34%over other existing models used for parsing XML files. 展开更多
关键词 Regression parallel parsing multi-cores XML
下载PDF
Parallel Image Processing: Taking Grayscale Conversion Using OpenMP as an Example
5
作者 Bayan AlHumaidan Shahad Alghofaily +2 位作者 Maitha Al Qhahtani Sara Oudah Naya Nagy 《Journal of Computer and Communications》 2024年第2期1-10,共10页
In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularl... In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularly noteworthy in the field of image processing, which witnessed significant advancements. This parallel computing project explored the field of parallel image processing, with a focus on the grayscale conversion of colorful images. Our approach involved integrating OpenMP into our framework for parallelization to execute a critical image processing task: grayscale conversion. By using OpenMP, we strategically enhanced the overall performance of the conversion process by distributing the workload across multiple threads. The primary objectives of our project revolved around optimizing computation time and improving overall efficiency, particularly in the task of grayscale conversion of colorful images. Utilizing OpenMP for concurrent processing across multiple cores significantly reduced execution times through the effective distribution of tasks among these cores. The speedup values for various image sizes highlighted the efficacy of parallel processing, especially for large images. However, a detailed examination revealed a potential decline in parallelization efficiency with an increasing number of cores. This underscored the importance of a carefully optimized parallelization strategy, considering factors like load balancing and minimizing communication overhead. Despite challenges, the overall scalability and efficiency achieved with parallel image processing underscored OpenMP’s effectiveness in accelerating image manipulation tasks. 展开更多
关键词 parallel Computing Image Processing OPENMP parallel programming High Performance Computing GPU (Graphic Processing Unit)
下载PDF
Parallel Processing Design for LTE PUSCH Demodulation and Decoding Based on Multi-Core Processor
6
作者 Zhang Ziran,Li Jun,Li Changxiao(ZTE Corporation,Shenzhen 518057,P.R.China) 《ZTE Communications》 2009年第1期54-58,共5页
The Long Term Evolution (LTE) system imposes high requirements for dispatching delay.Moreover,very large air interface rate of LTE requires good processing capability for the devices processing the baseband signals.Co... The Long Term Evolution (LTE) system imposes high requirements for dispatching delay.Moreover,very large air interface rate of LTE requires good processing capability for the devices processing the baseband signals.Consequently,the single-core processor cannot meet the requirements of LTE system.This paper analyzes how to use multi-core processors to achieve parallel processing of uplink demodulation and decoding in LTE systems and designs an approach to parallel processing.The test results prove that this approach works quite well. 展开更多
关键词 CORE LTE parallel Processing Design for LTE PUSCH Demodulation and Decoding Based on multi-core Processor Design
下载PDF
PDP: Parallel Dynamic Programming 被引量:15
7
作者 Fei-Yue Wang Jie Zhang +2 位作者 Qinglai Wei Xinhu Zheng Li Li 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2017年第1期1-5,共5页
Deep reinforcement learning is a focus research area in artificial intelligence. The principle of optimality in dynamic programming is a key to the success of reinforcement learning methods. The principle of adaptive ... Deep reinforcement learning is a focus research area in artificial intelligence. The principle of optimality in dynamic programming is a key to the success of reinforcement learning methods. The principle of adaptive dynamic programming(ADP)is first presented instead of direct dynamic programming(DP),and the inherent relationship between ADP and deep reinforcement learning is developed. Next, analytics intelligence, as the necessary requirement, for the real reinforcement learning, is discussed. Finally, the principle of the parallel dynamic programming, which integrates dynamic programming and analytics intelligence, is presented as the future computational intelligence. 展开更多
关键词 parallel dynamic programming Dynamic programming Adaptive dynamic programming Reinforcement learning Deep learning Neural networks Artificial intelligence
下载PDF
Parallel Control for Optimal Tracking via Adaptive Dynamic Programming 被引量:20
8
作者 Jingwei Lu Qinglai Wei Fei-Yue Wang 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2020年第6期1662-1674,共13页
This paper studies the problem of optimal parallel tracking control for continuous-time general nonlinear systems.Unlike existing optimal state feedback control,the control input of the optimal parallel control is int... This paper studies the problem of optimal parallel tracking control for continuous-time general nonlinear systems.Unlike existing optimal state feedback control,the control input of the optimal parallel control is introduced into the feedback system.However,due to the introduction of control input into the feedback system,the optimal state feedback control methods can not be applied directly.To address this problem,an augmented system and an augmented performance index function are proposed firstly.Thus,the general nonlinear system is transformed into an affine nonlinear system.The difference between the optimal parallel control and the optimal state feedback control is analyzed theoretically.It is proven that the optimal parallel control with the augmented performance index function can be seen as the suboptimal state feedback control with the traditional performance index function.Moreover,an adaptive dynamic programming(ADP)technique is utilized to implement the optimal parallel tracking control using a critic neural network(NN)to approximate the value function online.The stability analysis of the closed-loop system is performed using the Lyapunov theory,and the tracking error and NN weights errors are uniformly ultimately bounded(UUB).Also,the optimal parallel controller guarantees the continuity of the control input under the circumstance that there are finite jump discontinuities in the reference signals.Finally,the effectiveness of the developed optimal parallel control method is verified in two cases. 展开更多
关键词 Adaptive dynamic programming(ADP) nonlinear optimal control parallel controller parallel control theory parallel system tracking control neural network(NN)
下载PDF
Scheduling Step-Deteriorating Jobs on Parallel Machines by Mixed Integer Programming 被引量:4
9
作者 郭鹏 程文明 +1 位作者 曾鸣 梁剑 《Journal of Donghua University(English Edition)》 EI CAS 2015年第5期709-714,719,共7页
Production scheduling has a major impact on the productivity of the manufacturing process. Recently, scheduling problems with deteriorating jobs have attracted increasing attentions from researchers. In many practical... Production scheduling has a major impact on the productivity of the manufacturing process. Recently, scheduling problems with deteriorating jobs have attracted increasing attentions from researchers. In many practical situations,it is found that some jobs fail to be processed prior to the pre-specified thresholds,and they often consume extra deteriorating time for successful accomplishment. Their processing times can be characterized by a step-wise function. Such kinds of jobs are called step-deteriorating jobs. In this paper,parallel machine scheduling problem with stepdeteriorating jobs( PMSD) is considered. Due to its intractability,four different mixed integer programming( MIP) models are formulated for solving the problem under consideration. The study aims to investigate the performance of these models and find promising optimization formulation to solve the largest possible problem instances. The proposed four models are solved by commercial software CPLEX. Moreover,the near-optimal solutions can be obtained by black-box local-search solver LocalS olver with the fourth one. The computational results show that the efficiencies of different MIP models depend on the distribution intervals of deteriorating thresholds, and the performance of LocalS olver is clearly better than that of CPLEX in terms of the quality of the solutions and the computational time. 展开更多
关键词 parallel machine step-deterioration mixed integer programming(MIP) scheduling models total completion time
下载PDF
Programming for scientific computing on peta-scale heterogeneous parallel systems 被引量:1
10
作者 杨灿群 吴强 +2 位作者 唐滔 王锋 薛京灵 《Journal of Central South University》 SCIE EI CAS 2013年第5期1189-1203,共15页
Peta-scale high-performance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to co... Peta-scale high-performance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to conduct computational experiments of historic significance, these systems are presently difficult to program. The users, who are domain experts rather than computer experts, prefer to use programming models closer to their domains (e.g., physics and biology) rather than MPI and OpenMP. This has led the development of domain-specific programming that provides domain-specific programming interfaces but abstracts away some performance-critical architecture details. Based on experience in designing large-scale computing systems, a hybrid programming framework for scientific computing on heterogeneous architectures is proposed in this work. Its design philosophy is to provide a collaborative mechanism for domain experts and computer experts so that both domain-specific knowledge and performance-critical architecture details can be adequately exploited. Two real-world scientific applications have been evaluated on TH-1A, a peta-scale CPU-GPU heterogeneous system that is currently the 5th fastest supercomputer in the world. The experimental results show that the proposed framework is well suited for developing large-scale scientific computing applications on peta-scale heterogeneous CPU/GPU systems. 展开更多
关键词 计算系统 科学应用 异构系统 PETA 编程模型 并行系统 超级计算机 领域专家
下载PDF
PARALLEL MULTIPLICATIVE ITERATIVE METHODS FOR CONVEX PROGRAMMING
11
作者 陈忠 费浦生 《Acta Mathematica Scientia》 SCIE CSCD 1997年第2期205-210,共6页
In this paper, we present two parallel multiplicative algorithms for convex programming. If the objective function has compact level sets and has a locally Lipschitz continuous gradient, we discuss convergence of the ... In this paper, we present two parallel multiplicative algorithms for convex programming. If the objective function has compact level sets and has a locally Lipschitz continuous gradient, we discuss convergence of the algorithms. The proofs are essentially based on the results of sequential methods shown by Eggermontt[1]. 展开更多
关键词 parallel algorithm convex programming
下载PDF
Stochastic Programming Model for Discrete Lotsizing and Scheduling Problem on Parallel Machines
12
作者 Kensuke Ishiwata Jun Imaizumi +1 位作者 Takayuki Shiina Susumu Morito 《American Journal of Operations Research》 2012年第3期374-381,共8页
In recent years, it has been difficult for manufactures and suppliers to forecast demand from a market for a given product precisely. Therefore, it has become important for them to cope with fluctuations in demand. Fr... In recent years, it has been difficult for manufactures and suppliers to forecast demand from a market for a given product precisely. Therefore, it has become important for them to cope with fluctuations in demand. From this viewpoint, the problem of planning or scheduling in production systems can be regarded as a mathematical problem with stochastic elements. However, in many previous studies, such problems are formulated without stochastic factors, treating stochastic elements as deterministic variables or parameters. Stochastic programming incorporates such factors into the mathematical formulation. In the present paper, we consider a multi-product, discrete, lotsizing and scheduling problem on parallel machines with stochastic demands. Under certain assumptions, this problem can be formulated as a stochastic integer programming problem. We attempt to solve this problem by a scenario aggregation method proposed by Rockafellar and Wets. The results from computational experiments suggest that our approach is able to solve large-scale problems, and that, under the condition of uncertainty, incorporating stochastic elements into the model gives better results than formulating the problem as a deterministic model. 展开更多
关键词 STOCHASTIC programMING Lotsizing and Scheduling parallel MACHINES SCENARIO AGGREGATION Method
下载PDF
A Trace-state Based Approach to Specification and Design of Parallel Programs
13
作者 He Jifeng Oxford University Computing LaboratoryProgramming Research Group Parks Road, Oxford OXl 3QD, England 《计算机工程》 CAS CSCD 北大核心 1996年第S1期91-105,共15页
In this paper they deal with the issue of specification and design of parallel communicatingprocesses. A trace-state based model is introduced to describe the behaviour of concurrent programs. They presenta formal sys... In this paper they deal with the issue of specification and design of parallel communicatingprocesses. A trace-state based model is introduced to describe the behaviour of concurrent programs. They presenta formal system based on that model to achieve hierarchical and modular development and verification methods. Anumber of refinement rules are used to decompose the specification into smaller ones and calculate program fromthe 展开更多
关键词 COMM A Trace-state Based Approach to Specification and Design of parallel programs
下载PDF
Performance Analysis of Code OptimizationBased on TMS320C6678 Multi-core DSP
14
《计算机科学与技术汇刊(中英文版)》 2015年第2期35-39,共5页
In the development of modern DSP, more and more use of C/C++ as a development language has become a trend. Optimizationof C/C++ program has become an important link of the DSP software development. This article de... In the development of modern DSP, more and more use of C/C++ as a development language has become a trend. Optimizationof C/C++ program has become an important link of the DSP software development. This article describes the structure features ofTMS320C6678 processor, illustrates the principle of efficient optimization method for C/C++, and analyzes the results. 展开更多
关键词 TMS320C6678 program Optimization SOFTWARE Pipelining parallel Execution.
下载PDF
Grid Service Framework: Supporting Multi-Models Parallel Grid Programming
15
作者 邓倩妮 陆鑫达 《Journal of Shanghai Jiaotong university(Science)》 EI 2004年第1期56-59,共4页
Web service is a grid computing technology that promises greater ease-of-use and interoperability than previous distributed computing technologies. This paper proposed Group Service Framework, a grid computing platfor... Web service is a grid computing technology that promises greater ease-of-use and interoperability than previous distributed computing technologies. This paper proposed Group Service Framework, a grid computing platform based on Microsoft. NET that use web service to: (1) locate and harness volunteer computing resources for different applications, and (2) support multi-models such as Master/Slave, Divide and Conquer, Phase Parallel and so forth parallel programming paradigms in Grid environment, (3) allocate data and balance load dynamically and transparently for grid computing application. The Grid Service Framework based on Microsoft. NET was used to implement several simple parallel computing applications. The results show that the proposed Group Service Framework is suitable for generic parallel numerical computing. 展开更多
关键词 WEB服务器 计算机网络 群服务器结构 数据划分 平行数值计算
下载PDF
Optimal Redundancy Allocation in Hierarchical Series-Parallel Systems Using Mixed Integer Programming
16
作者 Mohsen Ziaee 《Applied Mathematics》 2013年第1期79-83,共5页
Reliability optimization plays an important role in design, operation and management of the industrial systems. System reliability can be easily enhanced by improving the reliability of unreliable components and/or by... Reliability optimization plays an important role in design, operation and management of the industrial systems. System reliability can be easily enhanced by improving the reliability of unreliable components and/or by using redundant configuration with subsystems/components in parallel. Redundancy Allocation Problem (RAP) was studied in this research. A mixed integer programming model was proposed to solve the problem, which considers simultaneously two objectives under several resource constraints. The model is only for the hierarchical series-parallel systems in which the elements of any subset of subsystems or components are connected in series or parallel and constitute a larger subsystem or total system. At the end of the study, the performance of the proposed approach was evaluated by a numerical example. 展开更多
关键词 HIERARCHICAL SERIES-parallel System Optimal REDUNDANCY ALLOCATION Mixed INTEGER programMING Formulation Reliability Optimization
下载PDF
基于EPPM理论胃癌患者一级亲属胃癌筛查行为干预方案构建
17
作者 温秀梅 苏丹 +3 位作者 李霞 王旭 刘李 唐媛 《中国卫生标准管理》 2024年第15期106-109,共4页
目的构建基于拓展平行过程理论(extended parallel process model,EPPM)的胃癌患者一级亲属胃癌筛查行为干预方案。方法2023年3—11月,在文献回顾和小组讨论的基础上形成基于EPPM理论的胃癌患者一级亲属胃癌筛查行为干预方案初稿,采用... 目的构建基于拓展平行过程理论(extended parallel process model,EPPM)的胃癌患者一级亲属胃癌筛查行为干预方案。方法2023年3—11月,在文献回顾和小组讨论的基础上形成基于EPPM理论的胃癌患者一级亲属胃癌筛查行为干预方案初稿,采用德尔菲专家函询法对18名专家进行2轮函询,形成干预方案终稿。结果2轮专家函询共构建2个一级指标、4个二级指标和27个三级指标的筛查干预方案。专家权威系数分别为0.822、0.884;2轮函询的肯德尔协调系数分别为0.176、0.373,差异有统计学意义(P<0.001)。结论构建的基于EPPM理论的胃癌患者一级亲属胃癌筛查行为干预方案具有可靠性和实用性,以期为胃癌患者一级亲属胃癌筛查行为干预提供借鉴依据。 展开更多
关键词 胃癌 一级亲属 拓展平行过程 筛查行为 干预方案 德尔菲法
下载PDF
超大规模数据处理中并行计算技术的应用研究
18
作者 杨多海 《科技创新与应用》 2024年第17期181-184,共4页
随着人工智能和大数据时代的到来,超大规模数据处理成了一个重要的研究领域。该文主要探讨并行计算技术在超大规模数据处理中的应用,首先详细阐述并行计算和超大规模数据处理的基本理论与概念,特别是并行计算的编程模型与工具,最后通过... 随着人工智能和大数据时代的到来,超大规模数据处理成了一个重要的研究领域。该文主要探讨并行计算技术在超大规模数据处理中的应用,首先详细阐述并行计算和超大规模数据处理的基本理论与概念,特别是并行计算的编程模型与工具,最后通过分析并行计算在搜索引擎、气象预报和金融分析等中的实际案例,阐述并行计算技术在超大规模数据处理中的实际应用。 展开更多
关键词 并行计算技术 超大规模数据处理 编程模型与工具 实际案例 具体应用
下载PDF
新型电力系统电磁暂态加速仿真技术
19
作者 聂春芳 郝正航 +1 位作者 陈卓 何朴想 《电子科技》 2024年第3期18-25,共8页
为解决新型电力系统电磁暂态仿真时由于系统拓扑结构复杂、电力电子开关器件较多以及仿真机单核计算能力不足导致的仿真效率低下、仿真难度大等问题,文中采用理想变压器模型分割算法将大规模新型电力系统模型分割成若干子系统,实现了大... 为解决新型电力系统电磁暂态仿真时由于系统拓扑结构复杂、电力电子开关器件较多以及仿真机单核计算能力不足导致的仿真效率低下、仿真难度大等问题,文中采用理想变压器模型分割算法将大规模新型电力系统模型分割成若干子系统,实现了大系统的解耦和降阶,有效减少了仿真时整个系统作为一个状态空间系统矩阵的运算量。为进一步减轻单个处理器的计算负担,利用CPU(Central Processing Unit)多核并行技术设计一款在裸机环境下高效并行运算的加速仿真平台UREP300。将分割后的模型载入UREP300进行加速仿真实验,同时与基于MATLAB/Simulink的原模型离线仿真进行对比。实验结果表明,融合理想变压器模型分割与多核并行运行的加速仿真技术能够在保障仿真精度的同时将仿真速度提升至原来的586倍,可显著提高仿真效率,适用于大规模新型电力系统的仿真工作。 展开更多
关键词 新型电力系统 电磁暂态 加速仿真 模型分割 理想变压器模型法 裸机编程 多核并行 多核调度
下载PDF
面向国产异构众核系统的Parallel C语言设计与实现 被引量:10
20
作者 何王全 刘勇 +2 位作者 方燕飞 魏迪 漆锋滨 《软件学报》 EI CSCD 北大核心 2017年第4期764-785,共22页
异构众核架构具有超高的性能功耗比,已成为超级计算机体系结构的重要发展方向.但众核系统更为复杂的并行层次和存储层次,给编程和优化带来了极大的挑战.因此,研究面向众核系统的并行编程技术,对于降低国产众核系统并行应用的编程难度、... 异构众核架构具有超高的性能功耗比,已成为超级计算机体系结构的重要发展方向.但众核系统更为复杂的并行层次和存储层次,给编程和优化带来了极大的挑战.因此,研究面向众核系统的并行编程技术,对于降低国产众核系统并行应用的编程难度、提升并行程序的性能都具有重要的意义.提出统一架构的多模式并行编程模型,包括异构融合的加速运算模型和按同构方式编程的自主运算模型,根据编程模型设计了Parallel C语言,能够有效地描述国产众核系统的异构并行性.与其他众核系统上MPI+X的使用模式相比,编程和系统优化都具有全局视角,在多级局部性描述、单边消息、兼容已有多核应用等方面具有特色;基于Open64构建了Parallel C编译系统,全面支持加速运算模型和自主运算模型,提出并实现了数据布局与自动DMA、编译指导的线程代理和拓扑位置感知的集合通信等优化.Micro Benchmark和实际应用在神威太湖之光计算机系统上的测试数据结果表明:Parallel C语言和编译系统具有良好的性能和可扩展性,能够有效支撑大型应用. 展开更多
关键词 异构众核 编程模型 并行语言 parallel C 编译器 消息传递
下载PDF
上一页 1 2 60 下一页 到第
使用帮助 返回顶部