期刊文献+
共找到369篇文章
< 1 2 19 >
每页显示 20 50 100
A New Hybrid Hierarchical Parallel Algorithm to Enhance the Performance of Large-Scale Structural Analysis Based on Heterogeneous Multicore Clusters
1
作者 Gaoyuan Yu Yunfeng Lou +2 位作者 Hang Dong Junjie Li Xianlong Jin 《Computer Modeling in Engineering & Sciences》 SCIE EI 2023年第7期135-155,共21页
Heterogeneous multicore clusters are becoming more popular for high-performance computing due to their great computing power and cost-to-performance effectiveness nowadays.Nevertheless,parallel efficiency degradation ... Heterogeneous multicore clusters are becoming more popular for high-performance computing due to their great computing power and cost-to-performance effectiveness nowadays.Nevertheless,parallel efficiency degradation is still a problem in large-scale structural analysis based on heterogeneousmulticore clusters.To solve it,a hybrid hierarchical parallel algorithm(HHPA)is proposed on the basis of the conventional domain decomposition algorithm(CDDA)and the parallel sparse solver.In this new algorithm,a three-layer parallelization of the computational procedure is introduced to enable the separation of the communication of inter-nodes,heterogeneous-core-groups(HCGs)and inside-heterogeneous-core-groups through mapping computing tasks to various hardware layers.This approach can not only achieve load balancing at different layers efficiently but can also improve the communication rate significantly through hierarchical communication.Additionally,the proposed hybrid parallel approach in this article can reduce the interface equation size and further reduce the solution time,which can make up for the shortcoming of growing communication overheads with the increase of interface equation size when employing CDDA.Moreover,the distributed sparse storage of a large amount of data is introduced to improve memory access.By solving benchmark instances on the Shenwei-Taihuzhiguang supercomputer,the results show that the proposed method can obtain higher speedup and parallel efficiency compared with CDDA and more superior extensibility of parallel partition compared with the two-level parallel computing algorithm(TPCA). 展开更多
关键词 heterogeneous multicore hybrid parallel finite element analysis domain decomposition
下载PDF
System Support for Parallel Computing on Heterogeneous Networks of Workstations 被引量:2
2
作者 Xiaodong Zhang(High Performance Computing and Software Laboratory University of Texas at San Antonio San Antonio, Texas 78249, U .S .A.) 《Wuhan University Journal of Natural Sciences》 CAS 1996年第Z1期362-370,共9页
Abstract In this paper, we introduce several on-going research projects to support parallel and distribut,ed computing on heterogeneous networks of workstations (NOW) in the High Performance Computing and Software Lah... Abstract In this paper, we introduce several on-going research projects to support parallel and distribut,ed computing on heterogeneous networks of workstations (NOW) in the High Performance Computing and Software Lahoratory at the University of Texas at San Antonio. The projects at aiming at addressing three technical issues. First, the factors of heterogeneity and time-sharing effects make traditional performance models/metrics for homogeneous computing performance measurement and evaluation not. suitable for bet.erogeneous computing. We develop practical models and metrics which quantify. the heterogeneity of networks and characterize the performance effects. Second, in order to perform parallel computation effectively, special system support is necessary. We are developing system schemes for heterogeneity management, process scheduling and efficient communications. Finally, to provide insight into system performance, we are developing two types of supporting tools : a graphical instrumentation monitor to aid users in investigating performance problems and in determining the most effective way of exploiting the NOW systems, and a trace-driven simulator to test and compare different system management and scheduling schemes. 展开更多
关键词 parallel SUPPORT SYSTEM heterogeneous COMPUTING
下载PDF
Programming for scientific computing on peta-scale heterogeneous parallel systems 被引量:1
3
作者 杨灿群 吴强 +2 位作者 唐滔 王锋 薛京灵 《Journal of Central South University》 SCIE EI CAS 2013年第5期1189-1203,共15页
Peta-scale high-performance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to co... Peta-scale high-performance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to conduct computational experiments of historic significance, these systems are presently difficult to program. The users, who are domain experts rather than computer experts, prefer to use programming models closer to their domains (e.g., physics and biology) rather than MPI and OpenMP. This has led the development of domain-specific programming that provides domain-specific programming interfaces but abstracts away some performance-critical architecture details. Based on experience in designing large-scale computing systems, a hybrid programming framework for scientific computing on heterogeneous architectures is proposed in this work. Its design philosophy is to provide a collaborative mechanism for domain experts and computer experts so that both domain-specific knowledge and performance-critical architecture details can be adequately exploited. Two real-world scientific applications have been evaluated on TH-1A, a peta-scale CPU-GPU heterogeneous system that is currently the 5th fastest supercomputer in the world. The experimental results show that the proposed framework is well suited for developing large-scale scientific computing applications on peta-scale heterogeneous CPU/GPU systems. 展开更多
关键词 计算系统 科学应用 异构系统 PETA 编程模型 并行系统 超级计算机 领域专家
下载PDF
Resource Scheduling Strategy for Performance Optimization Based on Heterogeneous CPU-GPU Platform
4
作者 Juan Fang Kuan Zhou +1 位作者 Mengyuan Zhang Wei Xiang 《Computers, Materials & Continua》 SCIE EI 2022年第10期1621-1635,共15页
In recent years,with the development of processor architecture,heterogeneous processors including Center processing unit(CPU)and Graphics processing unit(GPU)have become the mainstream.However,due to the differences o... In recent years,with the development of processor architecture,heterogeneous processors including Center processing unit(CPU)and Graphics processing unit(GPU)have become the mainstream.However,due to the differences of heterogeneous core,the heterogeneous system is now facing many problems that need to be solved.In order to solve these problems,this paper try to focus on the utilization and efficiency of heterogeneous core and design some reasonable resource scheduling strategies.To improve the performance of the system,this paper proposes a combination strategy for a single task and a multi-task scheduling strategy for multiple tasks.The combination strategy consists of two sub-strategies,the first strategy improves the execution efficiency of tasks on the GPU by changing the thread organization structure.The second focuses on the working state of the efficient core and develops more reasonable workload balancing schemes to improve resource utilization of heterogeneous systems.The multi-task scheduling strategy obtains the execution efficiency of heterogeneous cores and global task information through the processing of task samples.Based on this information,an improved ant colony algorithm is used to quickly obtain a reasonable task allocation scheme,which fully utilizes the characteristics of heterogeneous cores.The experimental results show that the combination strategy reduces task execution time by 29.13%on average.In the case of processing multiple tasks,the multi-task scheduling strategy reduces the execution time by up to 23.38%based on the combined strategy.Both strategies can make better use of the resources of heterogeneous systems and significantly reduce the execution time of tasks on heterogeneous systems. 展开更多
关键词 heterogeneous computing cpu-gpu PERFORMANCE Workload balance
下载PDF
城市洪涝模型及CPU-GPU异构并行计算技术研究进展 被引量:3
5
作者 黄国如 陈志威 曾博威 《水利学报》 EI CSCD 北大核心 2023年第6期654-665,共12页
在全球气候变暖和城市化背景下,城市洪涝问题日益严峻。为尽可能减少城市洪涝灾害造成的损失,提高城市对突发性强降雨事件的应急处理水平,开展城市洪涝数值模拟技术研究具有十分重要的意义。本文从城市洪涝精细化和高效模拟角度出发,综... 在全球气候变暖和城市化背景下,城市洪涝问题日益严峻。为尽可能减少城市洪涝灾害造成的损失,提高城市对突发性强降雨事件的应急处理水平,开展城市洪涝数值模拟技术研究具有十分重要的意义。本文从城市洪涝精细化和高效模拟角度出发,综述了城市洪涝模型、CPU-GPU异构并行计算的研究进展,系统总结了产汇流模型、一维河道管网模型、二维地表模型、耦合模型、快速城市洪涝模型的构建方法和CPU-GPU异构并行计算的关键技术。针对当前城市洪涝模型研究中的不足之处,需要开展城市洪涝过程全物理机制模拟研究,深入分析全水动力城市洪涝模型的适用性、模拟精度和计算效率;还需基于异构并行计算技术,实现城市洪涝模型一维河道管网、二维地表淹没的快速模拟,为城市暴雨洪涝精细化与高效模拟奠定基础。 展开更多
关键词 城市洪涝 产汇流模型 河道管网模型 地表模型 耦合模型 异构并行计算
下载PDF
Influence of heterogeneity on rock strength and stiffness using discrete element method and parallel bond model 被引量:7
6
作者 Spyridon Liakas Catherine O’Sullivan Charalampos Saroglou 《Journal of Rock Mechanics and Geotechnical Engineering》 SCIE CSCD 2017年第4期575-584,共10页
The particulate discrete element method(DEM) can be employed to capture the response of rock,provided that appropriate bonding models are used to cement the particles to each other.Simulations of laboratory tests are ... The particulate discrete element method(DEM) can be employed to capture the response of rock,provided that appropriate bonding models are used to cement the particles to each other.Simulations of laboratory tests are important to establish the extent to which those models can capture realistic rock behaviors.Hitherto the focus in such comparison studies has either been on homogeneous specimens or use of two-dimensional(2D) models.In situ rock formations are often heterogeneous,thus exploring the ability of this type of models to capture heterogeneous material behavior is important to facilitate their use in design analysis.In situ stress states are basically three-dimensional(3D),and therefore it is important to develop 3D models for this purpose.This paper revisits an earlier experimental study on heterogeneous specimens,of which the relative proportions of weaker material(siltstone) and stronger,harder material(sandstone) were varied in a controlled manner.Using a 3D DEM model with the parallel bond model,virtual heterogeneous specimens were created.The overall responses in terms of variations in strength and stiffness with different percentages of weaker material(siltstone) were shown to agree with the experimental observations.There was also a good qualitative agreement in the failure patterns observed in the experiments and the simulations,suggesting that the DEM data enabled analysis of the initiation of localizations and micro fractures in the specimens. 展开更多
关键词 Discrete element method(DEM) heterogeneous rocks Strength and stiffness parallel bond model
下载PDF
Scheduling algorithm based on critical tasks in heterogeneous environments 被引量:4
7
作者 Lan Zhou Sun Shixin 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2008年第2期398-404,F0003,共8页
Heterogeneous computing is one effective method of high performance computing with many advantages. Task scheduling is a critical issue in heterogeneous environments as well as in homogeneous environments. A number of... Heterogeneous computing is one effective method of high performance computing with many advantages. Task scheduling is a critical issue in heterogeneous environments as well as in homogeneous environments. A number of task scheduling algorithms for homogeneous environments have been proposed, whereas, a few for heterogeneous environments can be found in the literature. A novel task scheduling algorithm for heterogeneous environments, called the heterogeneous critical task (HCT) scheduling algorithm is presented. By means of the directed acyclic graph and the gantt graph, the HCT algorithm defines the critical task and the idle time slot. After determining the critical tasks of a given task, the HCT algorithm tentatively duplicates the critical tasks onto the processor that has the given task in the idle time slot, to reduce the start time of the given task. To compare the performance of the HCT algorithm with several recently proposed algorithms, a large set of randomly generated applications and the Gaussian elimination application are randomly generated. The experimental result has shown that the HCT algorithm outperforms the other algorithm. 展开更多
关键词 list scheduling task duplication task graphs heterogeneous environment parallel processing.
下载PDF
HOPE:a heterogeneity-oriented parallel execution engine for inference on mobiles
8
作者 夏春伟 ZHAO Jiacheng +1 位作者 CUI Huimin FENG Xiaobing 《High Technology Letters》 EI CAS 2022年第4期363-372,共10页
It is significant to efficiently support artificial intelligence(AI)applications on heterogeneous mobile platforms,especially coordinately execute a deep neural network(DNN)model on multiple computing devices of one m... It is significant to efficiently support artificial intelligence(AI)applications on heterogeneous mobile platforms,especially coordinately execute a deep neural network(DNN)model on multiple computing devices of one mobile platform.This paper proposes HOPE,an end-to-end heterogeneous inference framework running on mobile platforms to distribute the operators in a DNN model to different computing devices.The problem is formalized into an integer linear programming(ILP)problem and a heuristic algorithm is proposed to determine the near-optimal heterogeneous execution plan.The experimental results demonstrate that HOPE can reduce up to 36.2%inference latency(with an average of 22.0%)than MOSAIC,22.0%(with an average of 10.2%)than StarPU and 41.8%(with an average of 18.4%)thanμLayer respectively. 展开更多
关键词 deep neural network(DNN) mobile heterogeneous scheduler parallel computing
下载PDF
An Improved Model for Computing-Intensive Tasks on Heterogeneous Workstations
9
作者 邬延辉 陆鑫达 《Journal of Shanghai Jiaotong university(Science)》 EI 2004年第2期6-9,15,共5页
An improved algorithm, which solves cooperative concurrent computing tasks using the idle cycles of a number of high performance heterogeneous workstations interconnected through a high-speed network, was proposed. In... An improved algorithm, which solves cooperative concurrent computing tasks using the idle cycles of a number of high performance heterogeneous workstations interconnected through a high-speed network, was proposed. In order to get better parallel computation performance, this paper gave a model and an algorithm of task scheduling among heterogeneous workstations, in which the costs of loading data, computing, communication and collecting results are considered. Using this efficient algorithm, an optimal subset of heterogeneous workstations with the shortest parallel executing time of tasks can be selected. 展开更多
关键词 不同工作站 并行计算 工作日程 协作并行计算 计算机网络
下载PDF
A Multilevel Hierarchical Parallel Algorithm for Large-Scale Finite Element Modal Analysis
10
作者 Gaoyuan Yu Yunfeng Lou +2 位作者 Hang Dong Junjie Li Xianlong Jin 《Computers, Materials & Continua》 SCIE EI 2023年第9期2795-2816,共22页
The strict and high-standard requirements for the safety and stability ofmajor engineering systems make it a tough challenge for large-scale finite element modal analysis.At the same time,realizing the systematic anal... The strict and high-standard requirements for the safety and stability ofmajor engineering systems make it a tough challenge for large-scale finite element modal analysis.At the same time,realizing the systematic analysis of the entire large structure of these engineering systems is extremely meaningful in practice.This article proposes a multilevel hierarchical parallel algorithm for large-scale finite element modal analysis to reduce the parallel computational efficiency loss when using heterogeneous multicore distributed storage computers in solving large-scale finite element modal analysis.Based on two-level partitioning and four-transformation strategies,the proposed algorithm not only improves the memory access rate through the sparsely distributed storage of a large amount of data but also reduces the solution time by reducing the scale of the generalized characteristic equation(GCEs).Moreover,a multilevel hierarchical parallelization approach is introduced during the computational procedure to enable the separation of the communication of inter-nodes,intra-nodes,heterogeneous core groups(HCGs),and inside HCGs through mapping computing tasks to various hardware layers.This method can efficiently achieve load balancing at different layers and significantly improve the communication rate through hierarchical communication.Therefore,it can enhance the efficiency of parallel computing of large-scale finite element modal analysis by fully exploiting the architecture characteristics of heterogeneous multicore clusters.Finally,typical numerical experiments were used to validate the correctness and efficiency of the proposedmethod.Then a parallel modal analysis example of the cross-river tunnel with over ten million degrees of freedom(DOFs)was performed,and ten-thousand core processors were applied to verify the feasibility of the algorithm. 展开更多
关键词 heterogeneous multicore multilevel hierarchical parallel load balancing large-scale modal analysis
下载PDF
非均质油藏层间干扰室内实验优化
11
作者 王杰 黎鸿屿 +2 位作者 吕栋梁 钱川川 周群茂 《新疆石油地质》 CAS CSCD 北大核心 2024年第2期199-204,共6页
多层非均质油藏在合注合采开发时,受储集层岩性、物性、地层压力、流体性质等因素影响,层与层之间相互干扰。早期开展的并联驱替室内实验,无法有效地模拟油藏多层合采时各层间的流体交换,且所定义的干扰系数的物理内涵与注水开发渗流过... 多层非均质油藏在合注合采开发时,受储集层岩性、物性、地层压力、流体性质等因素影响,层与层之间相互干扰。早期开展的并联驱替室内实验,无法有效地模拟油藏多层合采时各层间的流体交换,且所定义的干扰系数的物理内涵与注水开发渗流过程不符。为此,建立串并联组合驱替实验模型,模拟储集层层内岩性的变化。通过研究串并联驱替实验下不同渗透率岩心的产油量、含水率以及采收率,对干扰系数进行验证和再认识。研究结果表明:层间干扰的实质是不同储集层渗流阻力随着时间的变化,导致储集层流量分配发生改变;储集层非均质性是多层合采过程中形成优势渗流通道的主要因素。研究结果为后续开展层间干扰相关实验设计和非均质油藏合理高效开发提供了参考依据。 展开更多
关键词 非均质油藏 层间干扰 干扰系数 驱替实验 并联 串联 岩心
下载PDF
eMD:基于异构计算的大规模分子动力学模拟软件
12
作者 徐顺 张宝花 +1 位作者 刘倩 金钟 《数据与计算发展前沿》 CSCD 2024年第1期21-34,共14页
【目的】异构计算已经成为高性能计算的重要组成部分,GPU异构计算可显著提速计算密集型的分子动力学模拟应用,本文介绍自研分子动力学模拟软件eMD的系统设计及其异构计算应用。【方法】首先介绍eMD软件的目标定位,包括应用功能和计算性... 【目的】异构计算已经成为高性能计算的重要组成部分,GPU异构计算可显著提速计算密集型的分子动力学模拟应用,本文介绍自研分子动力学模拟软件eMD的系统设计及其异构计算应用。【方法】首先介绍eMD软件的目标定位,包括应用功能和计算性能两方面;然后介绍软件概要设计,包括框架、模块和接口等组成部分;重点围绕面向异构计算的软件架构设计和移植优化技术进行阐述。【结果】eMD软件系统基于GPU异构计算可实现大规模体系模拟,同时提供特色的分子动力学模拟算法和模型。【结论】eMD将充分发挥GPU异构计算算力,以提升分子动力学模拟应用效率,助力分子建模理论方法的创新应用和分子科学问题的研究。 展开更多
关键词 分子动力学 GPU异构计算 并行计算 国产超算
下载PDF
面向国产异构众核系统的Parallel C语言设计与实现 被引量:8
13
作者 何王全 刘勇 +2 位作者 方燕飞 魏迪 漆锋滨 《软件学报》 EI CSCD 北大核心 2017年第4期764-785,共22页
异构众核架构具有超高的性能功耗比,已成为超级计算机体系结构的重要发展方向.但众核系统更为复杂的并行层次和存储层次,给编程和优化带来了极大的挑战.因此,研究面向众核系统的并行编程技术,对于降低国产众核系统并行应用的编程难度、... 异构众核架构具有超高的性能功耗比,已成为超级计算机体系结构的重要发展方向.但众核系统更为复杂的并行层次和存储层次,给编程和优化带来了极大的挑战.因此,研究面向众核系统的并行编程技术,对于降低国产众核系统并行应用的编程难度、提升并行程序的性能都具有重要的意义.提出统一架构的多模式并行编程模型,包括异构融合的加速运算模型和按同构方式编程的自主运算模型,根据编程模型设计了Parallel C语言,能够有效地描述国产众核系统的异构并行性.与其他众核系统上MPI+X的使用模式相比,编程和系统优化都具有全局视角,在多级局部性描述、单边消息、兼容已有多核应用等方面具有特色;基于Open64构建了Parallel C编译系统,全面支持加速运算模型和自主运算模型,提出并实现了数据布局与自动DMA、编译指导的线程代理和拓扑位置感知的集合通信等优化.Micro Benchmark和实际应用在神威太湖之光计算机系统上的测试数据结果表明:Parallel C语言和编译系统具有良好的性能和可扩展性,能够有效支撑大型应用. 展开更多
关键词 异构众核 编程模型 并行语言 parallel C 编译器 消息传递
下载PDF
带准备时间的异构并行机调度规则自动设计方法
14
作者 钟宏扬 刘建军 +2 位作者 曾创锋 陈庆新 毛宁 《工业工程》 2024年第2期87-97,共11页
以大规模定制化的家电行业生产为背景,将家电总装产线的投产排序决策抽象成为一类带准备时间的异构并行机动态调度问题。针对人工调度规则解决动态调度问题简单高效,但场景适应性弱的特点,引入了基于遗传规划(genetic programming,GP)... 以大规模定制化的家电行业生产为背景,将家电总装产线的投产排序决策抽象成为一类带准备时间的异构并行机动态调度问题。针对人工调度规则解决动态调度问题简单高效,但场景适应性弱的特点,引入了基于遗传规划(genetic programming,GP)的规则自动设计框架。首先,通过分析家电总装产线生产特征以及优化需求,以最小化平均拖期为优化目标,建立异构并行机调度模型;随后,针对问题特征,构建线体指派-工单排序规则对协同进化的改进型GP算法,并提取线体、工单的特征属性输入GP算法框架以自动设计调度规则。最后,基于某家电企业实际案例数据设计大量算例测试集,通过对比GP算法与人工设计规则在差异化工况场景的实验结果,验证GP算法有效性,并进一步分析了GP算法构造规则受不同生产环境参数的影响。 展开更多
关键词 异构并行机 动态调度 启发式规则 遗传规划
下载PDF
CPU-GPU系统中基于剖分的全局性能优化方法 被引量:10
15
作者 张保 董小社 +3 位作者 白秀秀 曹海军 刘超 梅一多 《西安交通大学学报》 EI CAS CSCD 北大核心 2012年第2期17-23,共7页
针对将应用移植到CPU-GPU异构并行系统上时优化策略各自分散、没有一个全局的指导思想的问题,提出了一种基于剖分的全局性能优化方法.该方法由优化策略库、剖分工具库和策略配置模块组成.优化策略库将应用移植到异构并行系统上的性能优... 针对将应用移植到CPU-GPU异构并行系统上时优化策略各自分散、没有一个全局的指导思想的问题,提出了一种基于剖分的全局性能优化方法.该方法由优化策略库、剖分工具库和策略配置模块组成.优化策略库将应用移植到异构并行系统上的性能优化过程划分为访存级、内核加速级和数据划分级3级优化;针对3级优化剖分工具库提供了3级剖分机制,通过运行时的剖分技术获取剖分信息;策略配置模块根据所获取的信息指导用户在每级优化中选择合适的优化策略.实验证明,基于剖分的全局性能优化方法可以明确地指导将应用移植到CPU-GPU异构并行系统上的全局优化过程,利用该优化方法后,以矩阵相乘和傅里叶变换为例的应用性能提升明显,最终性能相对于访存级优化最高可提高30%左右. 展开更多
关键词 cpu-gpu异构并行系统 全局优化 3级优化 3级剖分
下载PDF
面向国产超算平台的通用能源管网仿真计算模型
16
作者 韩璞 商建东 +3 位作者 薛飞 谢景明 王洪生 王海 《计算机应用研究》 CSCD 北大核心 2024年第3期866-872,共7页
为实现城市能源管网仿真软件的自主可控,基于国产异构高性能计算机“嵩山”超级计算平台,提出一种通用的城市能源管网仿真计算模型。通过优化管网中“非管”组件模型,提高了计算模型对国产异构并行计算机系统的适配性;将不同管网组件的... 为实现城市能源管网仿真软件的自主可控,基于国产异构高性能计算机“嵩山”超级计算平台,提出一种通用的城市能源管网仿真计算模型。通过优化管网中“非管”组件模型,提高了计算模型对国产异构并行计算机系统的适配性;将不同管网组件的计算过程进行封装,弱化了网络组件在仿真计算过程的依赖性,提升模型在工程实现上的可并行性。供水、燃气和热力三种场景的并行仿真实验,证明了计算模型在解决城市能源供给网络的仿真计算上具有一定的普适性;通过管网实测数据与仿真模型中模拟数据对比结果表明仿真管网压力的误差率在4%以下,其温度的误差率低于2%,同时也说明了提出的管网仿真计算模型在国产超算平台上具有良好的计算通用性。 展开更多
关键词 异构计算 能源管网 仿真模型 流体网络 并行计算
下载PDF
资源限制性并行任务固定优先级可调度性分析
17
作者 韩美灵 孙施宁 +4 位作者 金曦 邓庆绪 郑彬双 夏长清 宋波 《小型微型计算机系统》 CSCD 北大核心 2024年第6期1496-1503,共8页
异构多核平台的发展,导致并行任务需要执行在具有多样性资源的多核平台上.虽然,并行任务的某个程序片段只能在规定的资源上执行,但是这样操作可以充分利用各类不同资源的特性,达到更加快速节能处理任务的目的.同时,具有资源限制任务的... 异构多核平台的发展,导致并行任务需要执行在具有多样性资源的多核平台上.虽然,并行任务的某个程序片段只能在规定的资源上执行,但是这样操作可以充分利用各类不同资源的特性,达到更加快速节能处理任务的目的.同时,具有资源限制任务的可调度性研究在实时嵌入式系统领域已有一定的研究成果,但是采用的任务模型相对简单,分析方法不够精确.鉴于此,本文对具有资源限制性的并行任务在全局固定优先级调度策略下的可调度性问题进行了研究,基于单并行任务的分析方法提出了基于全局固定优先级调度策略的分析方法.首先,基于分解策略提出了高优先级任务干涉的分析方法.然后,将高优先级任务干涉分析方法和单并行任务提出的路径抽象技术相结合,推导出并行任务的最差响应时间算法.最后,通过仿真实验进行验证所提出的算法在可调度性、精确度层面的性能.实验结果表明,提出的算法在各个参数下的接受率实验符合实验预期,分析时间相对降低,但平均分析时间仍然在离线分析的可接受范围内,提出的算法能够对实时系统并行软件设计提供一定的指导价值. 展开更多
关键词 异构多核 嵌入式实时系统 可调度性分析 并行任务 最差响应时间
下载PDF
面向国产异构DCU平台的大规模并行矩量法研究
18
作者 贾瑞鹏 林中朝 +2 位作者 左胜 张玉 杨美红 《西安电子科技大学学报》 EI CAS CSCD 北大核心 2024年第2期76-83,共8页
面向国产异构众核处理器超级计算机发展趋势,实现了基于CPU+DCU国产异构并行系统的大规模并行高阶矩量法。在同构并行矩量法负载均衡策略的基础上,提出了一种“MPI+openMP+DCU”的高效异构并行编程框架,解决了计算任务与计算能力不匹配... 面向国产异构众核处理器超级计算机发展趋势,实现了基于CPU+DCU国产异构并行系统的大规模并行高阶矩量法。在同构并行矩量法负载均衡策略的基础上,提出了一种“MPI+openMP+DCU”的高效异构并行编程框架,解决了计算任务与计算能力不匹配的问题,实现了矩量法异构并行计算过程的负载均衡。采用细粒度任务划分策略与异步通信技术,对深度计算处理器计算过程进行了流水线优化设计,实现了计算与通信重叠,提升了矩量法异构协同计算的效率。通过与有限元法的仿真结果对比,验证了CPU+DCU异构并行矩量法的准确性。基于国产深度计算处理器异构平台的可扩展性分析结果表明,与单纯CPU计算相比,所实现的CPU+DCU异构协同计算方法能够获得5.5~7.0倍的加速效果,且在国家超级计算西安中心能够实现全系统运行,并行规模从360节点扩展到3 600节点(共1 036 800个处理器核心),并行效率可以达到约73.5%。 展开更多
关键词 高阶矩量法 国产异构并行系统 深度计算处理器 异构协同并行计算
下载PDF
面向国产加速卡的OpenFOAM线程并行加速研究
19
作者 尚小敏 李强 +4 位作者 高凌云 陶顺安 周全 袁武 陆忠华 《数据与计算发展前沿》 CSCD 2024年第2期134-144,共11页
【背景】随着流体力学模拟的精细化,CFD流体模拟软件OpenFOAM对算力的需求持续增加,新型东方超级计算系统是国产自研的新型异构超算。【目的】以新型东方超级计算系统为平台移植OpenFOAM,实现OpenFOAM的国产超算适配与加速。【方法】首... 【背景】随着流体力学模拟的精细化,CFD流体模拟软件OpenFOAM对算力的需求持续增加,新型东方超级计算系统是国产自研的新型异构超算。【目的】以新型东方超级计算系统为平台移植OpenFOAM,实现OpenFOAM的国产超算适配与加速。【方法】首先,通过分析“东方”超级计算系统和OpenFOAM的功能架构,本文制定了适用于国产加速卡的求解器,通过使用本文移植后的CUSP来调用国产加速卡底层代码从而实现稀疏存储格式矩阵向量乘以及diagonal矩阵预处理。其次,在此基础上,实现了单节点多国产加速卡的并行SpMV。【结果】本文使用了OpenFOAM自带的pitzDaily算例进行算法验证,并通过多种加速性能的对比方法对测试性能进行分析,取得了19.7倍的加速效果。【局限】本研究只实现了单节点的OpenFOAM并行优化。【结论】本研究结果对于发挥OpenFOAM在流体力学方面的优势和扩大超算软件适应面具有重要意义。 展开更多
关键词 OPENFOAM 并行计算 异构计算 国产加速卡 移植优化
下载PDF
基于异构系统的多级并行稀疏张量向量乘算法
20
作者 陈玥丹 肖国庆 +3 位作者 阳王东 金纪勇 龙军 李肯立 《计算机学报》 EI CSCD 北大核心 2024年第2期441-455,共15页
张量在许多实际应用中被用来表示大规模、多源、高维、多模态的数据.稀疏张量分解作为挖掘数据中隐藏信息的有效方法之一,已被广泛应用于机器学习、文本分析、生物医疗等研究领域中.稀疏张量向量乘(Sparse Tensor-VectorMultiplication,... 张量在许多实际应用中被用来表示大规模、多源、高维、多模态的数据.稀疏张量分解作为挖掘数据中隐藏信息的有效方法之一,已被广泛应用于机器学习、文本分析、生物医疗等研究领域中.稀疏张量向量乘(Sparse Tensor-VectorMultiplication,SpTV)是张量分解中最基础、耗时最多的运算之一.为加速大数据和人工智能相关应用的运行效率,本文提出了基于CPU-GPU异构结构的多级并行SpTV加速算法.首先,为了将SpTV运算映射到混合、多级并行的分布式CPU-GPU异构多/众核构架,本文设计了一种多维并行SpTV划分方法,采用面向节点级并行的N-1维张量划分和面向GPU线程级并行的矩阵划分,充分利用计算节点间和节点内的多级并行计算能力.其次,设计了一种基于稀疏张量纤维的压缩存储格式,压缩稀疏张量的内存占用,优化SpTV运算的计算和访存模式.最后,提出了基于多流并行的异构高效SpTV算法,进一步设计了稀疏张量的细粒度划分方法、多流并行运行机制和基于张量块排序的多流并行优化技术,实现了SpTV运算中通信开销和计算开销的相互重叠与隐藏.实验结果表明,与相关工作aeSpTV相比,所提出的SpTV算法在所有测试数据集上最高能够获得3.28倍的加速比. 展开更多
关键词 cpu-gpu 异构并行计算 多级并行 稀疏张量 张量运算
下载PDF
上一页 1 2 19 下一页 到第
使用帮助 返回顶部