期刊文献+
共找到42篇文章
< 1 2 3 >
每页显示 20 50 100
Shared Cache Based on Content Addressable Memory in a Multi-Core Architecture
1
作者 Allam Abumwais Mahmoud Obaid 《Computers, Materials & Continua》 SCIE EI 2023年第3期4951-4963,共13页
Modern shared-memory multi-core processors typically have shared Level 2(L2)or Level 3(L3)caches.Cache bottlenecks and replacement strategies are the main problems of such architectures,where multiple cores try to acc... Modern shared-memory multi-core processors typically have shared Level 2(L2)or Level 3(L3)caches.Cache bottlenecks and replacement strategies are the main problems of such architectures,where multiple cores try to access the shared cache simultaneously.The main problem in improving memory performance is the shared cache architecture and cache replacement.This paper documents the implementation of a Dual-Port Content Addressable Memory(DPCAM)and a modified Near-Far Access Replacement Algorithm(NFRA),which was previously proposed as a shared L2 cache layer in a multi-core processor.Standard Performance Evaluation Corporation(SPEC)Central Processing Unit(CPU)2006 benchmark workloads are used to evaluate the benefit of the shared L2 cache layer.Results show improved performance of the multicore processor’s DPCAM and NFRA algorithms,corresponding to a higher number of concurrent accesses to shared memory.The new architecture significantly increases system throughput and records performance improvements of up to 8.7%on various types of SPEC 2006 benchmarks.The miss rate is also improved by about 13%,with some exceptions in the sphinx3 and bzip2 benchmarks.These results could open a new window for solving the long-standing problems with shared cache in multi-core processors. 展开更多
关键词 multi-core processor shared cache content addressable memory dual port CAM replacement algorithm benchmark program
下载PDF
Multi-core optimization for conjugate gradient benchmark on heterogeneous processors
2
作者 邓林 窦勇 《Journal of Central South University》 SCIE EI CAS 2011年第2期490-498,共9页
Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall',due to limited capacity of local storage,limited bandwidth and long latency for memory access. Aiming at t... Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall',due to limited capacity of local storage,limited bandwidth and long latency for memory access. Aiming at this problem,a parallelization approach was proposed with six memory optimization schemes for CG,four schemes of them aiming at all kinds of sparse matrix-vector multiplication (SPMV) operation. Conducted on IBM QS20,the parallelization approach can reach up to 21 and 133 times speedups with size A and B,respectively,compared with single power processor element. Finally,the conclusion is drawn that the peak bandwidth of memory access on Cell BE can be obtained in SPMV,simple computation is more efficient on heterogeneous processors and loop-unrolling can hide local storage access latency while executing scalar operation on SIMD cores. 展开更多
关键词 multi-core processor NAS parallelization CG memory optimization
下载PDF
Parallel Processing Design for LTE PUSCH Demodulation and Decoding Based on Multi-Core Processor
3
作者 Zhang Ziran,Li Jun,Li Changxiao(ZTE Corporation,Shenzhen 518057,P.R.China) 《ZTE Communications》 2009年第1期54-58,共5页
The Long Term Evolution (LTE) system imposes high requirements for dispatching delay.Moreover,very large air interface rate of LTE requires good processing capability for the devices processing the baseband signals.Co... The Long Term Evolution (LTE) system imposes high requirements for dispatching delay.Moreover,very large air interface rate of LTE requires good processing capability for the devices processing the baseband signals.Consequently,the single-core processor cannot meet the requirements of LTE system.This paper analyzes how to use multi-core processors to achieve parallel processing of uplink demodulation and decoding in LTE systems and designs an approach to parallel processing.The test results prove that this approach works quite well. 展开更多
关键词 CORE LTE Parallel Processing Design for LTE PUSCH Demodulation and Decoding Based on multi-core processor Design
下载PDF
A VLIW Architecture Stream Cryptographic Processor for Information Security 被引量:4
4
作者 Longmei Nan Xuan Yang +4 位作者 Xiaoyang Zeng Wei Li Yiran Du Zibin Dai Lin Chen 《China Communications》 SCIE CSCD 2019年第6期185-199,共15页
As an important branch of information security algorithms,the efficient and flexible implementation of stream ciphers is vital.Existing implementation methods,such as FPGA,GPP and ASIC,provide a good support,but they ... As an important branch of information security algorithms,the efficient and flexible implementation of stream ciphers is vital.Existing implementation methods,such as FPGA,GPP and ASIC,provide a good support,but they could not achieve a better tradeoff between high speed processing and high flexibility.ASIC has fast processing speed,but its flexibility is poor,GPP has high flexibility,but the processing speed is slow,FPGA has high flexibility and processing speed,but the resource utilization is very low.This paper studies a stream cryptographic processor which can efficiently and flexibly implement a variety of stream cipher algorithms.By analyzing the structure model,processing characteristics and storage characteristics of stream ciphers,a reconfigurable stream cryptographic processor with special instructions based on VLIW is presented,which has separate/cluster storage structure and is oriented to stream cipher operations.The proposed instruction structure can effectively support stream cipher processing with multiple data bit widths,parallelism among stream cipher processing with different data bit widths,and parallelism among branch control and stream cipher processing with high instruction level parallelism;the designed separate/clustered special bit registers and general register heaps,key register heaps can satisfy cryptographic requirements.So the proposed processor not only flexibly accomplishes the combination of multiple basic stream cipher operations to finish stream cipher algorithms.It has been implemented with 0.18μm CMOS technology,the test results show that the frequency can reach 200 MHz,and power consumption is 310 mw.Ten kinds of stream ciphers were realized in the processor.The key stream generation throughput of Grain-80,W7,MICKEY,ACHTERBAHN and Shrink algorithm is 100 Mbps,66.67 Mbps,66.67 Mbps,50 Mbps and 800 Mbps,respectively.The test result shows that the processor presented can achieve good tradeoff between high performance and flexibility of stream ciphers. 展开更多
关键词 STREAM cipher VLIW architecture processor RECONFIGURABLE application-specific instruction-set
下载PDF
A distributed cross-domain register filefor reconfigurable cryptographic processor 被引量:1
5
作者 Zhang Baoning Ge Wei Wang Zhen 《Journal of Southeast University(English Edition)》 EI CAS 2017年第3期260-265,共6页
Due to the fact that the register files seriously affect the performance and area of coarse-grained reconfigurable cryptographic processors, an efficient structure of the distributed cross-domain register file is prop... Due to the fact that the register files seriously affect the performance and area of coarse-grained reconfigurable cryptographic processors, an efficient structure of the distributed cross-domain register file is proposed to realize a cryptographic processor with a high performance and a lowarea cost. In order to meet the demands of high performance and high flexibility at a lowarea cost, a union structure with the multi-ports access structure, i, e., a distributed crossdomain register file, is designed by analyzing the algorithm features of different ciphers. Considering different algorithm requirements of the global register files and local register files,the circuit design is realized by adopting different design parameters under TSMC( Taiwan Semiconductor Manufacturing Company) 40 nm CMOS( complementary metal oxide semiconductor) technology and compared with other similar works. The experimental results showthat the proposed distributed cross-domain register structure can effectively improve the performance of the unit area, of which the total performance of block per cycle is improved by17. 79% and performance of block per cycle per area is improved by 117%. 展开更多
关键词 RECONFIGURABLE processor BLOCK cipher parallelimplementation REGISTER FILE
下载PDF
A Reconfigurable Block Cryptographic Processor Based on VLIW Architecture 被引量:11
6
作者 LI Wei ZENG Xiaoyang +2 位作者 NAN Longmei CHEN Tao DAI Zibin 《China Communications》 SCIE CSCD 2016年第1期91-99,共9页
An Efficient and flexible implementation of block ciphers is critical to achieve information security processing.Existing implementation methods such as GPP,FPGA and cryptographic application-specific ASIC provide the... An Efficient and flexible implementation of block ciphers is critical to achieve information security processing.Existing implementation methods such as GPP,FPGA and cryptographic application-specific ASIC provide the broad range of support.However,these methods could not achieve a good tradeoff between high-speed processing and flexibility.In this paper,we present a reconfigurable VLIW processor architecture targeted at block cipher processing,analyze basic operations and storage characteristics,and propose the multi-cluster register-file structure for block ciphers.As for the same operation element of block ciphers,we adopt reconfigurable technology for multiple cryptographic processing units and interconnection scheme.The proposed processor not only flexibly accomplishes the combination of multiple basic cryptographic operations,but also realizes dynamic configuration for cryptographic processing units.It has been implemented with0.18μm CMOS technology,the test results show that the frequency can reach 350 MHz.and power consumption is 420 mw.Ten kinds of block and hash ciphers were realized in the processor.The encryption throughput of AES,DES,IDEA,and SHA-1 algorithm is1554 Mbps,448Mbps,785 Mbps,and 424 Mbps respectively,the test result shows that our processor's encryption performance is significantly higher than other designs. 展开更多
关键词 Block cipher VLIW processor reconfigurable application-specific instruction-set
下载PDF
System Architecture of Godson-3 Multi-Core Processors 被引量:7
7
作者 高翔 陈云霁 +2 位作者 王焕东 唐丹 胡伟武 《Journal of Computer Science & Technology》 SCIE EI CSCD 2010年第2期181-191,共11页
Godson-3 is the latest generation of Godson microprocessor family. It takes a scalable multi-core architecture with hardware support for accelerating applications including X86 emulation and signal processing. This pa... Godson-3 is the latest generation of Godson microprocessor family. It takes a scalable multi-core architecture with hardware support for accelerating applications including X86 emulation and signal processing. This paper introduces the system architecture of Godson-3 from various aspects including system scalability, organization of memory hierarchy, network-on-chip, inter-chip connection and I/O subsystem. 展开更多
关键词 multi-core processor scalable interconnection cache coherent non-uniform memory access/non-uniform cache access (CC-NUMA/NUCA) MESH CROSSBAR cache coherence reliability availability and serviceability (RAS)
原文传递
Parallel computing of discrete element method on multi-core processors 被引量:6
8
作者 Yusuke Shigeto Mikio Sakai 《Particuology》 SCIE EI CAS CSCD 2011年第4期398-405,共8页
This paper describes parallel simulation techniques for the discrete element method (DEM) on multi-core processors. Recently, multi-core CPU and GPU processors have attracted much attention in accelerating computer ... This paper describes parallel simulation techniques for the discrete element method (DEM) on multi-core processors. Recently, multi-core CPU and GPU processors have attracted much attention in accelerating computer simulations in various fields. We propose a new algorithm for multi-thread parallel computation of DEM, which makes effective use of the available memory and accelerates the computation. This study shows that memory usage is drastically reduced by using this algorithm. To show the practical use of DEM in industry, a large-scale powder system is simulated with a complicated drive unit. We compared the performance of the simulation between the latest GPU and CPU processors with optimized programs for each processor. The results show that the difference in performance is not substantial when using either GPUs or CPUs with a multi-thread parallel algorithm. In addition, DEM algorithm is shown to have high scalabilitv in a multi-thread parallel computation on a CPU. 展开更多
关键词 Discrete element method Parallel computing multi-core processor GPGPU
原文传递
Energy Efficiency of a Multi-Core Processor by Tag Reduction
9
作者 郑龙 董冕雄 +3 位作者 Kaoru Ota 金海 Song Guo 马俊 《Journal of Computer Science & Technology》 SCIE EI CSCD 2011年第3期491-503,共13页
We consider the energy saving problem for caches on a multi-core processor. In the previous research on low power processors, there are various methods to reduce power dissipation. Tag reduction is one of them. This p... We consider the energy saving problem for caches on a multi-core processor. In the previous research on low power processors, there are various methods to reduce power dissipation. Tag reduction is one of them. This paper extends the tag reduction technique on a single-core processor to a multi-core processor and investigates the potential of energy saving for multi-core processors. We formulate our approach as an equivalent problem which is to find an assignment of the whole instruction pages in the physical memory to a set of cores such that the tag-reduction conflicts for each core can be mostly avoided or reduced. We then propose three algorithms using different heuristics for this assignment problem. We provide convincing experimental results by collecting experimental data from a real operating system instead of the traditional way using a processor simulator that cannot simulate operating system functions and the full memory hierarchy. Experimental results show that our proposed algorithms can save total energy up to 83.93% on an 8-core processor and 76.16% on a 4-core processor in average compared to the one that the tag-reduction is not used for. They also significantly outperform the tag reduction based algorithm on a single-core processor. 展开更多
关键词 tag reduction multi-core processor energy efficiency
原文传递
Schedule refinement for homogeneous multi-core processors in the presence of manufacturing-caused heterogeneity
10
作者 Zhi-xiang CHEN Zhao-lin LI +2 位作者 Shan CAO Fang WANG Jie ZHOU 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2015年第12期1018-1033,共16页
Multi-core homogeneous processors have been widely used to deal with computation-intensive embedded applications. However, with the continuous down scaling of CMOS technology, within-die variations in the manufacturin... Multi-core homogeneous processors have been widely used to deal with computation-intensive embedded applications. However, with the continuous down scaling of CMOS technology, within-die variations in the manufacturing process lead to a significant spread in the operating speeds of cores within homogeneous multi-core processors. Task scheduling approaches, which do not consider such heterogeneity caused by within-die variations,can lead to an overly pessimistic result in terms of performance. To realize an optimal performance according to the actual maximum clock frequencies at which cores can run, we present a heterogeneity-aware schedule refining(HASR) scheme by fully exploiting the heterogeneities of homogeneous multi-core processors in embedded domains.We analyze and show how the actual maximum frequencies of cores are used to guide the scheduling. In the scheme,representative chip operating points are selected and the corresponding optimal schedules are generated as candidate schedules. During the booting of each chip, according to the actual maximum clock frequencies of cores, one of the candidate schedules is bound to the chip to maximize the performance. A set of applications are designed to evaluate the proposed scheme. Experimental results show that the proposed scheme can improve the performance by an average value of 22.2%, compared with the baseline schedule based on the worst case timing analysis. Compared with the conventional task scheduling approach based on the actual maximum clock frequencies, the proposed scheme also improves the performance by up to 12%. 展开更多
关键词 Schedule refining multi-core processor HETEROGENEITY Representative chip operating point
原文传递
Thread Private Variable Access Optimization Technique for Sunway High-Performance Multi-core Processors
11
作者 Jinying Kong Kai Nie +2 位作者 Qinglei Zhou Jinlong Xu Lin Han 《国际计算机前沿大会会议论文集》 2021年第1期180-189,共10页
The primary way to achieve thread-level parallelism on the Sunwayhigh-performance multicore processor is to use the OpenMP programming technique.To address the problem of low parallelism efficiency caused by slow acce... The primary way to achieve thread-level parallelism on the Sunwayhigh-performance multicore processor is to use the OpenMP programming technique.To address the problem of low parallelism efficiency caused by slow accessto thread private variables in the compilation of Sunway OpenMP programs, thispaper proposes a thread private variable access technique based on privilegedinstructions. The privileged instruction-based thread-private variable access techniquecentralizes the implementation of thread-private variables at the compilerlevel, eliminating the model switching overhead of invoking OS core processingand improving the speed of accessing thread-private variables. On the Sunway1621 server platform, NPB3.3-OMP and SPEC OMP2012 achieved 6.2% and6.8% running efficiency gains, respectively. The results show that the techniquesproposed in this paper can provide technical support for giving full play to theadvantages of Sunway’s high-performance multi-core processors. 展开更多
关键词 Sunway high-performance multi-core processors OpenMP programming technique Privileged instruction-based thread-private variable access technique Sunway 1621 processor
原文传递
Parallel Region Reconstruction Technique for Sunway High-Performance Multi-core Processors
12
作者 Kai Nie Qinglei Zhou +3 位作者 Hong Qian Jianmin Pang Jinlong Xu Yapeng Li 《国际计算机前沿大会会议论文集》 2021年第1期163-179,共17页
The leading way to achieve thread-level parallelism on the Sunwayhigh-performance multicore processors is to use OpenMP programming techniques.In order to address the problem of low parallel efficiency caused by hight... The leading way to achieve thread-level parallelism on the Sunwayhigh-performance multicore processors is to use OpenMP programming techniques.In order to address the problem of low parallel efficiency caused by highthread group control overhead in the compilation of Sunway OpenMP programs,this paper proposes the parallel region reconstruction technique. The parallelregion reconstruction technique expands the parallel scope of parallel regionsin OpenMP programs by parallel region merging and parallel region extending.Moreover, it reduces the number of parallel regions in OpenMP programs,decreases the overhead of frequent creation and convergence of thread groups,and converts standard fork-join model OpenMP programs to higher performanceSPMD modelOpenMP programs. On the Sunway 1621 server computer, NPB3.3-OMP and SPEC OMP2012 achieved 8.9% and 7.9% running efficiency improvementrespectively through parallel region reconstruction technique. As a result,the parallel region reconstruction technique is feasible and effective. It providestechnical support to fully exploit the multi-core parallelism advantage of Sunway’shigh-performance processors. 展开更多
关键词 Sunway high-performance multi-core processors OpenMP programming technique Parallel domain reconstruction technique
原文传递
面向分组密码的可重构异构多核并行处理架构 被引量:7
13
作者 冯晓 李伟 +2 位作者 戴紫彬 马超 李功丽 《电子学报》 EI CAS CSCD 北大核心 2017年第6期1311-1320,共10页
现有的可重构分组密码实现结构中,专用指令处理器吞吐率不高,阵列结构资源利用率低、算法映射过程复杂.为此,设计了分组密码可重构异构多核并行处理架构RAMCA(Reconfigurable Asymmetrical Multi-Core Architecture),分析了典型SP(AES-1... 现有的可重构分组密码实现结构中,专用指令处理器吞吐率不高,阵列结构资源利用率低、算法映射过程复杂.为此,设计了分组密码可重构异构多核并行处理架构RAMCA(Reconfigurable Asymmetrical Multi-Core Architecture),分析了典型SP(AES-128)、Feistel(SMS4)、L-M(IDEA)及MISTY(KASUMI)结构算法在RAMCA上的映射过程.在65nm CMOS工艺下完成了逻辑综合和功能仿真.实验表明,RAMCA工作频率可达到1GHz,面积约为1.13mm2,消除工艺影响后,对各分组密码算法的运算速度均高于现有专用指令处理器以及Celator、RCPA和BCORE等阵列结构密码处理系统. 展开更多
关键词 分组密码 异构多核 可重构 并行处理 密码处理器
下载PDF
基于流体系架构的分组密码处理器设计 被引量:2
14
作者 李功丽 戴紫彬 +3 位作者 徐进辉 王寿成 朱玉飞 冯晓 《计算机研究与发展》 EI CSCD 北大核心 2017年第12期2824-2833,共10页
为提升密码处理器性能,构建了密码处理器性能模型.基于该模型,提出多级资源共享、绑定前/后异或操作、最大化算法并行度等处理器性能提升技术,并根据性能提升技术确定了功能单元的种类和数量.然而功能单元不仅数量较多,而且在操作位宽... 为提升密码处理器性能,构建了密码处理器性能模型.基于该模型,提出多级资源共享、绑定前/后异或操作、最大化算法并行度等处理器性能提升技术,并根据性能提升技术确定了功能单元的种类和数量.然而功能单元不仅数量较多,而且在操作位宽和操作延迟方面均有较大差异,如何有效组织这些功能单元成为了一个关键问题.利用流体系结构可以高效集成大量功能单元的特点,设计并实现了基于流体系结构的可重构分组密码处理器原型,并通过把功能单元划分为基本处理单元,bank间共享单元和簇间共享单元3个层次来解决功能单元处理位宽和操作延迟的差异.在65nm CMOS工艺下对处理器原型进行综合,并在该结构上映射了典型的分组密码算法.实验结果证明:该处理器以较小的面积获得了较高的性能,对典型分组密码算法的处理速度,不仅超越了国际上的密码专用指令处理器,而且高于国内可重构阵列结构密码处理器. 展开更多
关键词 分组密码 流处理器 性能模型 可重构 密码处理器
下载PDF
基于流体系结构的高效能分组密码处理器研究 被引量:3
15
作者 王寿成 严迎建 徐进辉 《电子学报》 EI CAS CSCD 北大核心 2017年第4期937-943,共7页
针对现有密码处理器存在的问题,借鉴流处理器架构,提出了高效能的可重构分组密码流处理器架构.该架构采用层次化设计思想,通过分块式本地寄存器组的数据组织方式和共享拼接使用运算单元机制,实现了软件流水和硬件流水的协同工作,能够挖... 针对现有密码处理器存在的问题,借鉴流处理器架构,提出了高效能的可重构分组密码流处理器架构.该架构采用层次化设计思想,通过分块式本地寄存器组的数据组织方式和共享拼接使用运算单元机制,实现了软件流水和硬件流水的协同工作,能够挖掘分组内和分组间的指令级并行性并提高功能单元的利用率.在65nm CMOS工艺下对架构进行了综合仿真,并经过了大量算法映射.实验结果证明,该架构在CBC和ECB加密模式下均具有良好的加密性能.与其他密码处理器相比,该架构具有小面积、高效能的特点. 展开更多
关键词 分组密码 流处理器 可重构 软件流水 面积能效比
下载PDF
密码指令集扩展研究 被引量:1
16
作者 李美峰 戴冠中 +2 位作者 刘航 苗胜 张德刚 《计算机应用研究》 CSCD 北大核心 2008年第6期1833-1835,共3页
详细分析了常见密码算法的基本操作以及密码指令集扩展的研究现状,针对当前密码系统需要支持多种密码算法的特点指出未来密码指令集扩展的发展方向:指令设计需朝通用性上发展且通用密码处理器是处理器密码指令集扩展的最终目的。
关键词 密码指令集扩展 基本操作 通用性 通用密码处理器
下载PDF
面向密码流体系结构的超长指令字可重构研究 被引量:2
17
作者 严迎建 王寿成 +1 位作者 徐进辉 陈韬 《电子与信息学报》 EI CSCD 北大核心 2017年第1期206-212,共7页
可重构密码流体系结构是一种面向密码运算的新型体系结构,但存在着超长指令字(VLIW)代码稀疏和Kernel体积过大的问题。该文以可重构密码流处理架构S-RCCPA为研究平台,通过大量密码算法在S-RCCPA架构上的适配分析,提出了VLIW可重构技术,... 可重构密码流体系结构是一种面向密码运算的新型体系结构,但存在着超长指令字(VLIW)代码稀疏和Kernel体积过大的问题。该文以可重构密码流处理架构S-RCCPA为研究平台,通过大量密码算法在S-RCCPA架构上的适配分析,提出了VLIW可重构技术,并设计了Kernel级指令集、VLIW可重构算法及指令可重构单元。实验证明,该技术能够有效提高VLIW的指令密度,同时降低了VLIW的指令宽度,使得整个Kernel体积减小了约33.3%,并将微码存储器的容量由96 k B降为64 k B,有效降低芯片整体面积和系统功耗。 展开更多
关键词 密码流处理器 Kernel级指令 超长指令字 可重构 指令密度
下载PDF
ECC处理器时间随机化抗DPA攻击设计 被引量:1
18
作者 陈琳 严迎建 +1 位作者 周超 李默然 《电子技术应用》 北大核心 2015年第10期103-106,共4页
为了提高ECC密码处理器的抗差分能量攻击能力,提出了基于软中断的时间随机化抗差分能量攻击方法。首先分析了时间随机化抗DPA攻击的原理;然后结合ECC处理器的运算特征,设计了基于软中断的时间随机化电路;最后搭建了功耗仿真平台。对设... 为了提高ECC密码处理器的抗差分能量攻击能力,提出了基于软中断的时间随机化抗差分能量攻击方法。首先分析了时间随机化抗DPA攻击的原理;然后结合ECC处理器的运算特征,设计了基于软中断的时间随机化电路;最后搭建了功耗仿真平台。对设计进行仿真分析,结果表明,本设计能够满足抵抗DPA攻击,同时时间随机化部分的功耗特性在能量迹上不易被区分剔除,达到了处理器抗DPA攻击的设计要求。 展开更多
关键词 ECC 密码处理器 时间随机化 差分能量攻击 软中断
下载PDF
面向序列密码的比特级抽取指令研究与设计 被引量:1
19
作者 陈韬 马超 +2 位作者 罗兴国 李伟 常忠祥 《信息工程大学学报》 2015年第1期123-128,共6页
针对通用处理器中比特级操作效率低下的问题,提出了一种面向序列密码算法的比特级抽取指令,并构造了与之相应的硬件单元。将该单元在CMOS 0.13μm工艺下完成综合,同时通过NIOSⅡ扩展指令的方式把设计的专用指令加入到处理器中进行了性... 针对通用处理器中比特级操作效率低下的问题,提出了一种面向序列密码算法的比特级抽取指令,并构造了与之相应的硬件单元。将该单元在CMOS 0.13μm工艺下完成综合,同时通过NIOSⅡ扩展指令的方式把设计的专用指令加入到处理器中进行了性能评估。结果表明:该指令的加入并不影响处理器的处理器频率,与未经扩展指令的嵌入式RSIC处理器相比,完成相同的抽取操作指令条数从250条减少为1条,有效地提升了序列密码算法的处理性能。 展开更多
关键词 抽取操作 序列密码 处理器
下载PDF
基于共享存储器的密码多核处理器核间通信机制研究与设计
20
作者 李伟 李洋 陈帆 《微电子学与计算机》 CSCD 北大核心 2016年第1期19-23,共5页
为有效解决多核密码处理器中核间通信对密码处理性能制约的问题,研究了分组、序列、杂凑密码算法多核处理过程中核间通信的特点.依托分簇式的密码多核处理器,提出了基于共享存储器的硬件结构,能够有效支持256bit以内的任意数据位宽的核... 为有效解决多核密码处理器中核间通信对密码处理性能制约的问题,研究了分组、序列、杂凑密码算法多核处理过程中核间通信的特点.依托分簇式的密码多核处理器,提出了基于共享存储器的硬件结构,能够有效支持256bit以内的任意数据位宽的核间通信传输.提出了基于自选锁邮箱的同步器硬件结构,具备加1、置数、乒乓同步等功能.结合此共享存储器与同步器结构,提出了基于共享存储器的密码多核处理器核间通信机制.基于65nm CMOS工艺库,对共享存储器与同步器进行了实现,实验结果表明,提出的核间通信机制具有硬件开销小、同步效率高的特点,2个时钟周期能够完成4个处理器核对于4组256bit数据的核间通信与同步操作,相比其他核间通信方式,具有明显的优势. 展开更多
关键词 核间通信 密码多核处理器 共享存储 同步器
下载PDF
上一页 1 2 3 下一页 到第
使用帮助 返回顶部