期刊文献+
共找到5篇文章
< 1 >
每页显示 20 50 100
在Intel Knights Corner和NVIDIA Kepler架构上OpenACC的性能可移植性分析 被引量:1
1
作者 王一超 秦强 +1 位作者 施忠伟 林新华 《计算机科学》 CSCD 北大核心 2015年第1期75-78,共4页
OpenACC是一套基于指导语句方式的并行编程语言标准。编程者可以通过在代码中添加符合该标准的指导语句,经OpenACC编译器的编译,将串行代码并行化地移植到加速器或者协处理器上,进而获得异构加速器所带来的加速效果。OpenACC与CUDA和Ope... OpenACC是一套基于指导语句方式的并行编程语言标准。编程者可以通过在代码中添加符合该标准的指导语句,经OpenACC编译器的编译,将串行代码并行化地移植到加速器或者协处理器上,进而获得异构加速器所带来的加速效果。OpenACC与CUDA和OpenCL这类异构并行编程技术的不同之处在于,它的目的是使编程者在应用移植过程中不需要考虑加速器或协处理器的底层硬件架构,从而降低编程难度。同时它也具有仅需维护一套代码便可在不同硬件平台上运行的优良跨平台性。因此,OpenACC是一个值得研究的并行编程标准。如今的异构加速硬件设备呈现出多元化趋势。在2013年11月的Top500榜单上排名第一的"天河二号"使用了48000块构建在Intel Knights Corner架构之上的协处理器。与此同时,发布不久的NVIDIA公司最新的Kepler架构GPU产品由于多年来的GPU市场积累也迅速形成了可观的用户群体。对于并非追求性能极限的应用移植者而言,寻求应用性能和移植简易性之间的平衡是相当重要的议题。只需要编写一套代码便可运行在这两种硬件平台上的OpenACC正迎合了用户在移植简易性上的需求。解决了移植的简易性之后,同一个应用在不同硬件平台上的性能表现便成了用户最想了解的问题。通过实验和构建性能模型向读者展示使用OpenACC移植的应用在Intel Knights Corner和NVIDIA Kepler架构硬件上的性能可移植性。 展开更多
关键词 OpenACC 性能移植性 性能计算
下载PDF
Bellman-Ford算法性能可移植的GPU并行优化 被引量:7
2
作者 刘磊 王燕燕 +2 位作者 申春 李玉祥 刘雷 《吉林大学学报(工学版)》 EI CAS CSCD 北大核心 2015年第5期1559-1564,共6页
提出了一种面向GPU的性能可移植的并行归约求极值优化算法和全局访存优化算法,对Bellman-Ford算法进行并行化改造,以解决不同类型GPU设备上都存在的并行粒度不足和全局内存访问不连续等问题。实验结果表明:本文的优化算法在NVIDIA和AM... 提出了一种面向GPU的性能可移植的并行归约求极值优化算法和全局访存优化算法,对Bellman-Ford算法进行并行化改造,以解决不同类型GPU设备上都存在的并行粒度不足和全局内存访问不连续等问题。实验结果表明:本文的优化算法在NVIDIA和AMD的多款GPU设备上都取得了很好的效果,经本文算法优化后的程序性能较原始GPU并行版本提升3~6倍。 展开更多
关键词 计算机软件 Bellman-Ford算法 GPU并行编程及优化技术 并行归约算法 性能移植性
下载PDF
CUDA下单源最短路径算法并行优化 被引量:3
3
作者 张晗 钱育蓉 +2 位作者 王跃飞 陈人和 田宸玮 《计算机工程与设计》 北大核心 2019年第8期2181-2189,共9页
为设计基于固定序的Bellman-Ford算法在CUDA平台下并行优化方案,结合算法计算密集和数据密集的特点。从核函数计算层面,提出访存优化方法和基于固定序优化线程发散;从CPU-GPU传输层面,提出基于CUDA流优化数据传输开销方法。对不同显卡... 为设计基于固定序的Bellman-Ford算法在CUDA平台下并行优化方案,结合算法计算密集和数据密集的特点。从核函数计算层面,提出访存优化方法和基于固定序优化线程发散;从CPU-GPU传输层面,提出基于CUDA流优化数据传输开销方法。对不同显卡进行测试,参照共享内存容量划分线程块、缩减迭代后向量维度并使用CUDA流缩短首次计算时延,相比传统算法,改进后并行算法加速比在200倍左右。该并行优化方案验证了固定序在CUDA平台具有可行性和可移植性,可作为多平台研究参照。 展开更多
关键词 固定序改进算法 Bellman-Ford算法 并行计算 性能移植性 图形处理器 统一计算设备架构
下载PDF
Feasibility and safety of autologous bone marrow mononuclear cell transplantation in patients with advanced chronic liver disease 被引量:22
4
作者 Andre Castro lyra Milena Botelho Pereira Soares +9 位作者 luiz Flavio Maia da Silva Marcos Fraga Fortes André Goyanna Pinheiro Silva Augusto César de Andrade Mota Sheilla A Oliveira Eduardo lorens Braga Wilson Andrade de Carvalho Bernd Genser Ricardo Ribeiro dos Santos luiz Guilherme Costa lyra 《World Journal of Gastroenterology》 SCIE CAS CSCD 2007年第7期1067-1073,共7页
AIM: To evaluate the safety and feasibility of bone marrow cell (BMC) transplantation in patients with chronic liver disease on the waiting list for liver transplantation. METHODS: Ten patients (eight males) wit... AIM: To evaluate the safety and feasibility of bone marrow cell (BMC) transplantation in patients with chronic liver disease on the waiting list for liver transplantation. METHODS: Ten patients (eight males) with chronic liver disease were enrolled to receive infusion of autologous bone marrow-derived cells. Seven patients were classified as Child-Pugh B and three as Child-Pugh C. Baseline assessment included complete clinical and laboratory evaluation and abdominal MRI. Approximately 50 mL of bone marrow aspirate was prepared by centrifugation in a ficoll-hypaque gradient. At least of 100 millions of mononuclear-enriched BMCs were infused into the hepatic artery using the routine technique for arterial chemoembolization for liver tumors. Patients were followed up for adverse events up to 4 mo. RESULTS: The median age of the patients was 52 years (range 24-70 years). All patients were discharged 48 h after BMC infusion. Two patients complained ofmild pain at the bone marrow needle puncture site. No other complications or specific side effects related to the procedure were observed. Bilirubin levels were lower at 1 (2.19 ± 0.9) and 4 mo (2.10 ± 1.0) after cell transplantation that baseline levels (238 ± 1.2). Albumin levels 4 mo after BMC infusion (3.73 ± 0.5) were higher than baseline levels (3.47 ± 0.5). International normalized ratio (INR) decreased from 1.48 (SD = 0.23) to 1.43 (SD = 0.23) one month after cell transplantation. CONCLUSION: BMC infusion into hepatic artery of patients with advanced chronic liver disease is safe and feasible. In addition, a decrease in mean serum bilirubin and INR levels and an increase in albumin levels are observed. Our data warrant further studies in order to evaluate the effect of BMC transplantation in patients with advanced chronic liver disease. 展开更多
关键词 Bone marrow Cell transplantation Liverfailure Stem cell CIRRHOSIS
下载PDF
Improving performance portability for GPU-specific Open CL kernels on multi-core/many-core CPUs by analysis-based transformations
5
作者 Mei WEN Da-fei HUANG +1 位作者 Chang-qing XUN Dong CHEN 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2015年第11期899-916,共18页
OpenCL is an open heterogeneous programming framework. Although OpenCL programs are func- tionally portable, they do not provide performance portability, so code transformation often plays an irreplaceable role. When ... OpenCL is an open heterogeneous programming framework. Although OpenCL programs are func- tionally portable, they do not provide performance portability, so code transformation often plays an irreplaceable role. When adapting GPU-specific OpenCL kernels to run on multi-core/many-core CPUs, coarsening the thread granularity is necessary and thus has been extensively used. However, locality concerns exposed in GPU-specific OpenCL code are usually inherited without analysis, which may give side-effects on the CPU performance. Typi- cally, the use of OpenCL's local memory on multi-core/many-core CPUs may lead to an opposite performance effect, because local-memory arrays no longer match well with the hardware and the associated synchronizations are costly. To solve this dilemma, we actively analyze the memory access patterns using array-access descriptors derived from GPU-specific kernels, which can thus be adapted for CPUs by (1) removing all the unwanted local-memory arrays together with the obsolete barrier statements and (2) optimizing the coalesced kernel code with vectorization and locality re-exploitation. Moreover, we have developed an automated tool chain that makes this transformation of GPU-specific OpenCL kernels into a CPU-friendly form, which is accompanied with a scheduler that forms a new OpenCL runtime. Experiments show that the automated transformation can improve OpenCL kernel performance on a multi-core CPU by an average factor of 3.24. Satisfactory performance improvements axe also achieved on Intel's many-integrated-core coprocessor. The resultant performance on both architectures is better than or comparable with the corresponding OpenMP performance. 展开更多
关键词 OpenCL Performance portability Multi-core/many-core CPU Analysis-based transformation
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部