期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
High performance heterogeneous embedded computing: a review 被引量:5
1
作者 HE Yongfu WANG Shaojun PENG Yu 《Instrumentation》 2014年第2期1-12,共12页
As increasingly widening gap of computing demand and performance in embedded computing domain,heterogeneous computing architecture which delivers better performance as well as lower power in limited size is gaining mo... As increasingly widening gap of computing demand and performance in embedded computing domain,heterogeneous computing architecture which delivers better performance as well as lower power in limited size is gaining more and more attention. At first,the heterogeneous computing model is presented. And the different tightly coupled single chip heterogeneous architectures and their application domain are introduced. Then,task partitioning methods are described. Several programming model technology are analyzed and discussed. Finally,main challenges and future perspective of High Performance Embedded Computing(HPEC) are summarized. 展开更多
关键词 HPEC heterogeneous SoCs hardw are/softw are partition heterogeneous programming
下载PDF
RenderKernel:High-level programming for real-time rendering systems
2
作者 Jinyuan Yang Soumyabrata Dev Abraham G.Campbell 《Visual Informatics》 EI 2024年第3期82-95,共14页
Real-time rendering applications leverage heterogeneous computing to optimize performance.However,software development across multiple devices presents challenges,including data layout inconsistencies,synchronization ... Real-time rendering applications leverage heterogeneous computing to optimize performance.However,software development across multiple devices presents challenges,including data layout inconsistencies,synchronization issues,resource management complexities,and architectural disparities.Additionally,the creation of such systems requires verbose and unsafe programming models.Recent developments in domain-specific and unified shading languages aim to mitigate these issues.Yet,current programming models primarily address data layout consistency,neglecting other persistent challenges.In this paper,we introduce RenderKernel,a programming model designed to simplify the development of real-time rendering systems.Recognizing the need for a high-level approach,RenderKernel addresses the specific challenges of real-time rendering,enabling development on heterogeneous systems as if they were homogeneous.This model allows for early detection and prevention of errors due to system heterogeneity at compile-time.Furthermore,RenderKernel enables the use of common programming patterns from homogeneous environments,freeing developers from the complexities of underlying heterogeneous systems.Developers can focus on coding unique application features,thereby enhancing productivity and reducing the cognitive load associated with real-time rendering system development. 展开更多
关键词 heterogeneous programming High-level programming Real-time rendering Rendering systems
原文传递
MilkyWay-2 supercomputer: system and application 被引量:34
3
作者 Xiangke LIAO Liquan XIAO +1 位作者 Canqun YANG Yutong LU 《Frontiers of Computer Science》 SCIE EI CSCD 2014年第3期345-356,共12页
On June 17, 2013, MilkyWay-2 (Tianhe-2) supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. This paper provides an overview of the MilkyWay-2 project and describes the design... On June 17, 2013, MilkyWay-2 (Tianhe-2) supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. This paper provides an overview of the MilkyWay-2 project and describes the design of hardware and software systems. The key architecture features of MilkyWay-2 are highlighted, including neo-heterogeneous compute nodes integrating commodity- off-the-shelf processors and accelerators that share similar instruction set architecture, powerful networks that employ proprietary interconnection chips to support the massively parallel message-passing communications, proprietary 16- core processor designed for scientific computing, efficient software stacks that provide high performance file system, emerging programming model for heterogeneous systems, and intelligent system administration. We perform extensive evaluation with wide-ranging applications from LINPACK and Graph500 benchmarks to massively parallel software deployed in the system. 展开更多
关键词 MilkyWay-2 supercomputer petaflops computing neo-heterogeneous architecture interconnect network heterogeneous programing model system management benchmark optimization performance evaluation
原文传递
Efficient fine-grained shared buffer management for multiple OpenCL devices
4
作者 Chang-qing XUN Dong CHEN +1 位作者 Qiang LAN Chun-yuan ZHANG 《Journal of Zhejiang University-Science C(Computers and Electronics)》 SCIE EI 2013年第11期859-872,共14页
OpenCL programming provides full code portability between different hardware platforms,and can serve as a good programming candidate for heterogeneous systems,which typically consist of a host processor and several ac... OpenCL programming provides full code portability between different hardware platforms,and can serve as a good programming candidate for heterogeneous systems,which typically consist of a host processor and several accelerators.However,to make full use of the computing capacity of such a system,programmers are requested to manage diverse OpenCL-enabled devices explicitly,including distributing the workload between different devices and managing data transfer between multiple devices.All these tedious jobs pose a huge challenge for programmers.In this paper,a distributed shared OpenCL memory(DSOM) is presented,which relieves users of having to manage data transfer explicitly,by supporting shared buffers across devices.DSOM allocates shared buffers in the system memory and treats the on-device memory as a software managed virtual cache buffer.To support fine-grained shared buffer management,we designed a kernel parser in DSOM for buffer access range analysis.A basic modified,shared,invalid cache coherency is implemented for DSOM to maintain coherency for cache buffers.In addition,we propose a novel strategy to minimize communication cost between devices by launching each necessary data transfer as early as possible.This strategy enables overlap of data transfer with kernel execution.Our experimental results show that the applicability of our method for buffer access range analysis is good,and the efficiency of DSOM is high. 展开更多
关键词 Shared buffer OPENCL heterogeneous programming Fine grained
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部