期刊文献+
共找到6篇文章
< 1 >
每页显示 20 50 100
Cooperative Computing Techniques for a Deeply Fused and Heterogeneous Many-Core Processor Architecture 被引量:13
1
作者 郑方 李宏亮 +3 位作者 吕晖 过锋 许晓红 谢向辉 《Journal of Computer Science & Technology》 SCIE EI CSCD 2015年第1期145-162,共18页
Due to advances in semiconductor techniques, many-core processors have been widely used in high performance computing. However, many applications still cannot be carried out efficiently due to the memory wall, which h... Due to advances in semiconductor techniques, many-core processors have been widely used in high performance computing. However, many applications still cannot be carried out efficiently due to the memory wall, which has become a bottleneck in many-core processors. In this paper, we present a novel heterogeneous many-core processor architecture named deeply fused many-core (DFMC) for high performance computing systems. DFMC integrates management processing ele- ments (MPEs) and computing processing elements (CPEs), which are heterogeneous processor cores for different application features with a unified ISA (instruction set architecture), a unified execution model, and share-memory that supports cache coherence. The DFMC processor can alleviate the memory wall problem by combining a series of cooperative computing techniques of CPEs, such as multi-pattern data stream transfer, efficient register-level communication mechanism, and fast hardware synchronization technique. These techniques are able to improve on-chip data reuse and optimize memory access performance. This paper illustrates an implementation of a full system prototype based on FPGA with four MPEs and 256 CPEs. Our experimental results show that the effect of the cooperative computing techniques of CPEs is significant, with DGEMM (double-precision matrix multiplication) achieving an efficiency of 94%, FFT (fast Fourier transform) obtaining a performance of 207 GFLOPS and FDTD (finite-difference time-domain) obtaining a performance of 27 GFLOPS. 展开更多
关键词 heterogeneous many-core processor data stream transfer register-level communication mechanism hardwaresynchronization technique processor prototype
原文传递
Fault Tolerance Mechanism in Chip Many-Core Processors 被引量:1
2
作者 张磊 韩银和 +1 位作者 李华伟 李晓维 《Tsinghua Science and Technology》 SCIE EI CAS 2007年第S1期169-174,共6页
As semiconductor technology advances, there will be billions of transistors on a single chip. Chip many-core processors are emerging to take advantage of these greater transistor densities to deliver greater performan... As semiconductor technology advances, there will be billions of transistors on a single chip. Chip many-core processors are emerging to take advantage of these greater transistor densities to deliver greater performance. Effective fault tolerance techniques are essential to improve the yield of such complex chips. In this paper, a core-level redundancy scheme called N+M is proposed to improve N-core processors’ yield by providing M spare cores. In such architecture, topology is an important factor because it greatly affects the processors’ performance. The concept of logical topology and a topology reconfiguration problem are introduced, which is able to transparently provide target topology with lowest performance degradation as the presence of faulty cores on-chip. A row rippling and column stealing (RRCS) algorithm is also proposed. Results show that PRCS can give solutions with average 13.8% degradation with negligible computing time. 展开更多
关键词 chip many-core processors YIELD fault tolerance RECONFIGURATION NETWORK-ON-CHIP
原文传递
Parallelization and sustainability of distributed genetic algorithms on many-core processors
3
作者 Yuji Sato Mikiko Sato 《International Journal of Intelligent Computing and Cybernetics》 EI 2014年第1期2-23,共22页
Purpose–The purpose of this paper is to propose a fault-tolerant technology for increasing the durability of application programs when evolutionary computation is performed by fast parallel processing on many-core pr... Purpose–The purpose of this paper is to propose a fault-tolerant technology for increasing the durability of application programs when evolutionary computation is performed by fast parallel processing on many-core processors such as graphics processing units(GPUs)and multi-core processors(MCPs).Design/methodology/approach–For distributed genetic algorithm(GA)models,the paper proposes a method where an island’s ID number is added to the header of data transferred by this island for use in fault detection.Findings–The paper has shown that the processing time of the proposed idea is practically negligible in applications and also shown that an optimal solution can be obtained even with a single stuck-at fault or a transient fault,and that increasing the number of parallel threads makes the system less susceptible to faults.Originality/value–The study described in this paper is a new approach to increase the sustainability of application program using distributed GA on GPUs and MCPs. 展开更多
关键词 Evolutionary computation Genetic algorithms Fault identification many-core processors PARALLELIZATION
原文传递
面向神威·太湖之光的国产异构众核处理器OpenCL编译系统 被引量:7
4
作者 伍明川 黄磊 +2 位作者 刘颖 何先波 冯晓兵 《计算机学报》 EI CSCD 北大核心 2018年第10期2236-2250,共15页
近年来硬件设计呈现出异构化的趋势,如何有效开发并行程序成为制约异构系统发展的瓶颈之一已成为业界共识.我国自主研制的"神威·太湖之光"超级计算机,采用了国产片上异构众核处理器SW26010,为了降低程序员的编程难度,同... 近年来硬件设计呈现出异构化的趋势,如何有效开发并行程序成为制约异构系统发展的瓶颈之一已成为业界共识.我国自主研制的"神威·太湖之光"超级计算机,采用了国产片上异构众核处理器SW26010,为了降低程序员的编程难度,同时提高软件的移植效率,作者设计并实现了支持国产SW26010众核处理器的OpenCL编译系统.该编译系统实现了OpenCL平台模型、内存模型和执行模型到SW26010众核处理器的映射与优化机制,同时生成性能良好的可执行文件.最后通过实验验证了该编译系统的正确性和有效性,典型OpenCL应用经该编译系统编译后,在中小输入规模下,性能显著优于Intel Xeon Phi,与NVIDIA GPU可比;在较大输入规模下,受限于局存SPM的容量限制,性能略低于NVIDIA GPU. 展开更多
关键词 OPENCL 异构 国产众核处理器 编译系统
下载PDF
面向神威·太湖之光的多核组协同的OpenCL编译方法 被引量:1
5
作者 伍明川 刘颖 +1 位作者 李立民 冯晓兵 《高技术通讯》 CAS 2022年第9期927-936,共10页
近年来,科学领域对高性能计算的需求与日俱增,如何有效利用新型超算架构的计算能力成为研究重点。我国自主研制的神威·太湖之光超算平台,采用了国产异构众核处理器SW26010,其包含4个核组,但未提供核组间的同步机制。为了增加其易... 近年来,科学领域对高性能计算的需求与日俱增,如何有效利用新型超算架构的计算能力成为研究重点。我国自主研制的神威·太湖之光超算平台,采用了国产异构众核处理器SW26010,其包含4个核组,但未提供核组间的同步机制。为了增加其易编程性,本文提出了面向神威·太湖之光的核组间同步方法,并在SWCL OpenCL编译器中实现了该核组间同步方法。该方法利用跨OpenCL主机内核的数据依赖分析来标识必要的同步操作位置,并通过SW26010的交叉段进行低开销的核组间通信,程序员在不使用消息传递接口(MPI)进行显式控制同步的情况下,可以自动地将一个OpenCL Kernel程序部署到多个核组上。使用SPEC ACCEL 1.2中的OpenCL测试用例在神威太湖之光平台的实验表明,本方法的加速效果明显优于传统的MPI实现版本。 展开更多
关键词 OPENCL 国产众核处理器 异构 同步 数据依赖分析
下载PDF
IAR开发环境下添加SM9B100MAL处理器支持的原理与方法 被引量:2
6
作者 吴昌昊 范云 +3 位作者 黄菊 王文俊 张自圃 邵雨新 《兵工自动化》 2021年第7期28-38,共11页
为解决SM9B100MAL处理器官方未提供IAR开发环境支持的问题,提出为IAR添加处理器支持的方法。通过对IAR开发环境、C-SPY调试器、Flash Loader框架和设备描述配置等多方面的机理分析,给出参考配置与代码及其相关解释,展现添加处理器支持... 为解决SM9B100MAL处理器官方未提供IAR开发环境支持的问题,提出为IAR添加处理器支持的方法。通过对IAR开发环境、C-SPY调试器、Flash Loader框架和设备描述配置等多方面的机理分析,给出参考配置与代码及其相关解释,展现添加处理器支持的过程。结果表明:添加支持后,即可在IAR开发环境中实现快速建立代码工程、一键下载程序、调试会话中结构化展示寄存器内容等功能。 展开更多
关键词 IAR C-SPY Flash Loader SM9B100MAL 开发环境 国产处理器 调试
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部