期刊文献+
共找到1,012篇文章
< 1 2 51 >
每页显示 20 50 100
Shared Cache Based on Content Addressable Memory in a Multi-Core Architecture
1
作者 Allam Abumwais Mahmoud Obaid 《Computers, Materials & Continua》 SCIE EI 2023年第3期4951-4963,共13页
Modern shared-memory multi-core processors typically have shared Level 2(L2)or Level 3(L3)caches.Cache bottlenecks and replacement strategies are the main problems of such architectures,where multiple cores try to acc... Modern shared-memory multi-core processors typically have shared Level 2(L2)or Level 3(L3)caches.Cache bottlenecks and replacement strategies are the main problems of such architectures,where multiple cores try to access the shared cache simultaneously.The main problem in improving memory performance is the shared cache architecture and cache replacement.This paper documents the implementation of a Dual-Port Content Addressable Memory(DPCAM)and a modified Near-Far Access Replacement Algorithm(NFRA),which was previously proposed as a shared L2 cache layer in a multi-core processor.Standard Performance Evaluation Corporation(SPEC)Central Processing Unit(CPU)2006 benchmark workloads are used to evaluate the benefit of the shared L2 cache layer.Results show improved performance of the multicore processor’s DPCAM and NFRA algorithms,corresponding to a higher number of concurrent accesses to shared memory.The new architecture significantly increases system throughput and records performance improvements of up to 8.7%on various types of SPEC 2006 benchmarks.The miss rate is also improved by about 13%,with some exceptions in the sphinx3 and bzip2 benchmarks.These results could open a new window for solving the long-standing problems with shared cache in multi-core processors. 展开更多
关键词 multi-core processor shared cache content addressable memory dual port CAM replacement algorithm benchmark program
下载PDF
Multi-core optimization for conjugate gradient benchmark on heterogeneous processors
2
作者 邓林 窦勇 《Journal of Central South University》 SCIE EI CAS 2011年第2期490-498,共9页
Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall',due to limited capacity of local storage,limited bandwidth and long latency for memory access. Aiming at t... Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall',due to limited capacity of local storage,limited bandwidth and long latency for memory access. Aiming at this problem,a parallelization approach was proposed with six memory optimization schemes for CG,four schemes of them aiming at all kinds of sparse matrix-vector multiplication (SPMV) operation. Conducted on IBM QS20,the parallelization approach can reach up to 21 and 133 times speedups with size A and B,respectively,compared with single power processor element. Finally,the conclusion is drawn that the peak bandwidth of memory access on Cell BE can be obtained in SPMV,simple computation is more efficient on heterogeneous processors and loop-unrolling can hide local storage access latency while executing scalar operation on SIMD cores. 展开更多
关键词 multi-core processor NAS parallelization CG memory optimization
下载PDF
Parallel Processing Design for LTE PUSCH Demodulation and Decoding Based on Multi-Core Processor
3
作者 Zhang Ziran,Li Jun,Li Changxiao(ZTE Corporation,Shenzhen 518057,P.R.China) 《ZTE Communications》 2009年第1期54-58,共5页
The Long Term Evolution (LTE) system imposes high requirements for dispatching delay.Moreover,very large air interface rate of LTE requires good processing capability for the devices processing the baseband signals.Co... The Long Term Evolution (LTE) system imposes high requirements for dispatching delay.Moreover,very large air interface rate of LTE requires good processing capability for the devices processing the baseband signals.Consequently,the single-core processor cannot meet the requirements of LTE system.This paper analyzes how to use multi-core processors to achieve parallel processing of uplink demodulation and decoding in LTE systems and designs an approach to parallel processing.The test results prove that this approach works quite well. 展开更多
关键词 CORE LTE Parallel Processing Design for LTE PUSCH Demodulation and Decoding Based on multi-core processor Design
下载PDF
Speeding up the MATLAB complex networks package using graphic processors 被引量:1
4
作者 张百达 唐玉华 +1 位作者 吴俊杰 李鑫 《Chinese Physics B》 SCIE EI CAS CSCD 2011年第9期460-467,共8页
The availability of computers and communication networks allows us to gather and analyse data on a far larger scale than previously. At present, it is believed that statistics is a suitable method to analyse networks ... The availability of computers and communication networks allows us to gather and analyse data on a far larger scale than previously. At present, it is believed that statistics is a suitable method to analyse networks with millions, or more, of vertices. The MATLAB language, with its mass of statistical functions, is a good choice to rapidly realize an algorithm prototype of complex networks. The performance of the MATLAB codes can be further improved by using graphic processor units (GPU). This paper presents the strategies and performance of the GPU implementation of a complex networks package, and the Jacket toolbox of MATLAB is used. Compared with some commercially available CPU implementations, GPU can achieve a speedup of, on average, 11.3x. The experimental result proves that the GPU platform combined with the MATLAB language is a good combination for complex network research. 展开更多
关键词 complex networks graphic processors unit MATLAB Jacket Toolbox
下载PDF
SDN-Based Switch Implementation on Network Processors 被引量:1
5
作者 Yunchun Li Guodong Wang 《Communications and Network》 2013年第3期434-437,共4页
Virtualization is the key technology of cloud computing. Network virtualization plays an important role in this field. Its performance is very relevant to network virtualizing. Nowadays its implementations are mainly ... Virtualization is the key technology of cloud computing. Network virtualization plays an important role in this field. Its performance is very relevant to network virtualizing. Nowadays its implementations are mainly based on the idea of Software Define Network (SDN). Open vSwitch is a sort of software virtual switch, which conforms to the OpenFlow protocol standard. It is basically deployed in the Linux kernel hypervisor. This leads to its performance relatively poor because of the limited system resource. In turn, the packet process throughput is very low.In this paper, we present a Cavium-based Open vSwitch implementation. The Cavium platform features with multi cores and couples of hard ac-celerators. It supports zero-copy of packets and handles packet more quickly. We also carry some experiments on the platform. It indicates that we can use it in the enterprise network or campus network as convergence layer and core layer device. 展开更多
关键词 SDN OPEN vSwitch network processorS OpenFlow
下载PDF
High-Level Portable Programming Language for Optimized Memory Use of Network Processors
6
作者 Yasusi Kanada 《Communications and Network》 2015年第1期55-69,共15页
Network processors (NPs) are widely used for programmable and high-performance networks;however, the programs for NPs are less portable, the number of NP program developers is small, and the development cost is high. ... Network processors (NPs) are widely used for programmable and high-performance networks;however, the programs for NPs are less portable, the number of NP program developers is small, and the development cost is high. To solve these problems, this paper proposes an open, high-level, and portable programming language called “Phonepl”, which is independent from vendor-specific proprietary hardware and software but can be translated into an NP program with high performance especially in the memory use. A common NP hardware feature is that a whole packet is stored in DRAM, but the header is cached in SRAM. Phonepl has a hardware-independent abstraction of this feature so that it allows programmers mostly unconscious of this hardware feature. To implement the abstraction, four representations of packet data type that cover all the packet operations (including substring, concatenation, input, and output) are introduced. Phonepl have been implemented on Octeon NPs used in plug-ins for a network-virtualization environment called the VNode Infrastructure, and several packet-handling programs were evaluated. As for the evaluation result, the conversion throughput is close to the wire rate, i.e., 10 Gbps, and no packet loss (by cache miss) occurs when the packet size is 256 bytes or larger. 展开更多
关键词 network processorS PORTABILITY HIGH-LEVEL Language Hardware Independence MEMORY Usage DRAM SRAM network Virtualization
下载PDF
Reconfigurable Communication Processor: A New Approach for Network Processor
7
作者 孙华 陈青山 张文渊 《Journal of Shanghai Jiaotong university(Science)》 EI 2003年第1期43-47,共5页
As the traditional RISC+ASIC/ASSP approach for network processor design can not meet the today’s requirements, this paper described an alternate approach, Reconfigurable Processing Architecture, to boost the performa... As the traditional RISC+ASIC/ASSP approach for network processor design can not meet the today’s requirements, this paper described an alternate approach, Reconfigurable Processing Architecture, to boost the performance to ASIC level while reserve the programmability of the traditional RISC based system. This paper covers both the hardware architecture and the software development environment architecture. 展开更多
关键词 network processor reconfigurable processor run time reconfiguration field programmable gate array (FPGA) raduced instruction set circuit (RISC) application specific integrated circuit(ASIC)
下载PDF
Architecture-level performance/power tradeoff in network processor design
8
作者 陈红松 季振洲 胡铭曾 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2007年第1期45-48,共4页
Network processors are used in the core node of network to flexibly process packet streams. With the increase of performance, the power of network processor increases fast, and power and cooling become a bottleneck. A... Network processors are used in the core node of network to flexibly process packet streams. With the increase of performance, the power of network processor increases fast, and power and cooling become a bottleneck. Architecture-level power conscious design must go beyond low-level circuit design. Architectural power and performance tradeoff should be considered at the same time. Simulation is an efficient method to design modem network processor before making chip. In order to achieve the tradeoff between performance and power, the processor simulator is used to design the architecture of network processor. Using Netbeneh, Commubench benchmark and processor simulator-SimpleScalar, the performance and power of network processor are quantitatively evaluated. New performance tradeoff evaluation metric is proposed to analyze the architecture of network processor. Based on the high performance lnteI IXP 2800 Network processor eonfignration, optimized instruction fetch width and speed ,instruction issue width, instruction window size are analyzed and selected. Simulation resuits show that the tradeoff design method makes the usage of network processor more effectively. The optimal key parameters of network processor are important in architecture-level design. It is meaningful for the next generation network processor design. 展开更多
关键词 network processor design performance/power simulation tradeoff evaluation optimization
下载PDF
Secure encryption embedded processor design for wireless sensor network application
9
作者 霍文捷 Liu Zhenglin Zou Xuecheng 《High Technology Letters》 EI CAS 2011年第1期75-79,共5页
This paper presents a new encryption embedded processor aimed at the application requirement of wireless sensor network (WSN). The new encryption embedded processor not only offers Rivest Shamir Adlemen (RSA), Adv... This paper presents a new encryption embedded processor aimed at the application requirement of wireless sensor network (WSN). The new encryption embedded processor not only offers Rivest Shamir Adlemen (RSA), Advanced Encryption Standard (AES), 3 Data Encryption Standard (3 DES) and Secure Hash Algorithm 1 (SHA - 1 ) security engines, but also involves a new memory encryption scheme. The new memory encryption scheme is implemented by a memory encryption cache (MEC), which protects the confidentiality of the memory by AES encryption. The experi- ments show that the new secure design only causes 1.9% additional delay on the critical path and cuts 25.7% power consumption when the processor writes data back. The new processor balances the performance overhead, the power consumption and the security and fully meets the wireless sensor environment requirement. After physical design, the new encryption embedded processor has been successfully tape-out. 展开更多
关键词 embedded processor security memory encryption wireless sensor network (WSN) CACHE
下载PDF
An Improved Cache Mechanism for a Cache-Based Network Processor
10
作者 Hayato Yamaki Hiroaki Nishi 《通讯和计算机(中英文版)》 2013年第3期277-286,共10页
关键词 高速缓存机制 网络处理器 网络流量 上下文 网络内容 IP电话 仿真结果 数据包
下载PDF
Optimized Processor for Sensor Networks Applications
11
作者 Ali Elkateeb 《通讯和计算机(中英文版)》 2012年第3期311-316,共6页
关键词 嵌入式处理器 传感器节点 网络应用 优化 节点设计 软核处理器 可重构系统 核心处理器
下载PDF
长向量处理器高效RNN推理方法
12
作者 苏华友 陈抗抗 杨乾明 《国防科技大学学报》 EI CAS CSCD 北大核心 2024年第1期121-130,共10页
模型深度的不断增加和处理序列长度的不一致对循环神经网络在不同处理器上的性能优化提出巨大挑战。针对自主研制的长向量处理器FT-M7032,实现了一个高效的循环神经网络加速引擎。该引擎采用行优先矩阵向量乘算法和数据感知的多核并行方... 模型深度的不断增加和处理序列长度的不一致对循环神经网络在不同处理器上的性能优化提出巨大挑战。针对自主研制的长向量处理器FT-M7032,实现了一个高效的循环神经网络加速引擎。该引擎采用行优先矩阵向量乘算法和数据感知的多核并行方式,提高矩阵向量乘的计算效率;采用两级内核融合优化方法降低临时数据传输的开销;采用手写汇编优化多种算子,进一步挖掘长向量处理器的性能潜力。实验表明,长向量处理器循环神经网络推理引擎可获得较高性能,相较于多核ARM CPU以及Intel Golden CPU,类循环神经网络模型长短记忆网络可获得最高62.68倍和3.12倍的性能加速。 展开更多
关键词 多核DSP 长向量处理器 循环神经网络 并行优化
下载PDF
NM-SpMM:面向国产异构向量处理器的半结构化稀疏矩阵乘算法
13
作者 姜晶菲 何源宏 +2 位作者 许金伟 许诗瑶 钱希福 《计算机工程与科学》 CSCD 北大核心 2024年第7期1141-1150,共10页
深度神经网络在自然语言处理、计算机视觉等领域取得了优异的成果,由于智能应用处理数据规模的增长和大模型的快速发展,对深度神经网络的推理性能要求越来越高,N∶M半结构化稀疏化技术成为平衡算力需求和应用效果的热点技术之一。国产... 深度神经网络在自然语言处理、计算机视觉等领域取得了优异的成果,由于智能应用处理数据规模的增长和大模型的快速发展,对深度神经网络的推理性能要求越来越高,N∶M半结构化稀疏化技术成为平衡算力需求和应用效果的热点技术之一。国产异构向量处理器FT-M7032为智能模型处理中的数据并行和指令并行开发提供了较大空间。针对N∶M半结构化稀疏模型计算稀疏模式多样性,提出了一种面向FT-M7032的可灵活配置的稀疏矩阵乘算法NM-SpMM。NM-SpMM设计了一种高效的压缩偏移地址稀疏编码格式COA,避免了半结构化参数配置对稀疏数据访存计算的影响。基于COA编码,NM-SpMM对不同维度稀疏矩阵计算进行了细粒度优化。在FT-M7032单核上的实验结果表明,相较于稠密矩阵乘,NM-SpMM能获得1.73~21.00倍的加速,相较于采用CuSPARSE稀疏计算库的NVIDIA V100 GPU,能获得0.04~1.04倍的加速。 展开更多
关键词 深度神经网络 图形处理器 向量处理器 稀疏矩阵乘 流水线
下载PDF
分布式高性能自组网节点技术研究 被引量:1
14
作者 于哲 周舜民 +4 位作者 王彬 孙艺铭 陈方 赵子龙 李贝贝 《现代电子技术》 北大核心 2024年第5期1-7,共7页
针对当前主流Mesh自组网技术节点传输带宽不足百兆,级跳数小于10的问题,提出采用多处理器构建实现分布式多跳、高带宽低时延的无线跳频的高性能自组网节点,对节点自动化组网连接、多信道选择避让、漫游切换及低时延高带宽网络多跳实现... 针对当前主流Mesh自组网技术节点传输带宽不足百兆,级跳数小于10的问题,提出采用多处理器构建实现分布式多跳、高带宽低时延的无线跳频的高性能自组网节点,对节点自动化组网连接、多信道选择避让、漫游切换及低时延高带宽网络多跳实现等关键技术进行研究实现。由测试结果分析可知,在20级跳内,文中节点组网带宽损失在30%以内且带宽保持在200 Mb/s以上,时延控制在100 ms内,可以满足现实应急场景下多终端智能硬件实时进行图像、视频等大数据量信息交互对高带宽低时延网络通信的需求。 展开更多
关键词 多处理器 自组网连接 多级跳 高带宽 低时延 信息交互
下载PDF
适用于S-NUCA异构处理器的任务调度与热管理系统
15
作者 周义涛 李阳 +3 位作者 韩超 赵玉来 汪玲 李建华 《计算机工程》 CAS CSCD 北大核心 2024年第2期196-205,共10页
异构多核处理器凭借其高性能、低功耗和广泛的应用场景而成为当前计算机平台的主流方案,且大容量的非均匀缓存架构(S-NUCA)具有较低的平均访问时间。然而,不断上升的晶体管规模给异构多核处理器的资源调度和功耗控制带来挑战,传统的调... 异构多核处理器凭借其高性能、低功耗和广泛的应用场景而成为当前计算机平台的主流方案,且大容量的非均匀缓存架构(S-NUCA)具有较低的平均访问时间。然而,不断上升的晶体管规模给异构多核处理器的资源调度和功耗控制带来挑战,传统的调度算法在面对基于S-NUCA的多核处理器时忽略了核心之间的缓存访问延迟,且传统热管理方案只提供芯片级功率约束,容易使得系统因核心使用率降低而造成性能下降。为此,提出一种适用于S-NUCA异构多核系统、满足热安全约束的动态线程调度机制TSCDM。利用基于动态每周期指令(IPC)值的阶段检测技术,并基于人工神经网络预测线程的IPC值,以获取线程与核心类型的最佳绑定关系,依据S-NUCA缓存特性获得最优映射和基于任务分类的任务迁移策略。在此基础上,TSCDM基于片上热模型为每个核心实时分配功率预算。在HotSniper上运行SPLASH-2性能测试套件进行实验,结果表明,相较于传统调度方案与基于机器学习的调度方案,TSCDM在加速比和资源利用率上均表现出优势,TSCDM中使用的基于瞬态温度的安全功率算法相比传统热安全功率算法能够降低核心热余量,同时处理器的全频段均有更高的能效比。 展开更多
关键词 异构多核处理器 人工神经网络 线程调度 阶段检测 热安全功率
下载PDF
一种基于PYNQ的神经网络加速系统
16
作者 赖嘉伟 魏洪健 +1 位作者 孙科学 王艳 《电子设计工程》 2024年第17期16-21,共6页
针对传统卷积神经网络计算复杂度高,耗时较长,难以应用到嵌入式移动端的问题,提出了一种以ZYNQ芯片作为主控的FPAG联合ARM实现的的神经网络加速系统。该系统的PL部分采用纯RTL开发,对卷积层的输入层和输出层进行了全并行化,对卷积窗口... 针对传统卷积神经网络计算复杂度高,耗时较长,难以应用到嵌入式移动端的问题,提出了一种以ZYNQ芯片作为主控的FPAG联合ARM实现的的神经网络加速系统。该系统的PL部分采用纯RTL开发,对卷积层的输入层和输出层进行了全并行化,对卷积窗口进行完全的展开,在一个时钟周期内可以同时完成81次乘法运算,同时对池化层和全连接层采用流水线的优化方式。相比常用的使用高层次综合工具进行优化的方法,该系统使用RTL语言从零开始设计卷积神经网络各个模块,进行了细粒度的优化,避免了冗余逻辑资源的产生,充分利用了片上资源。针对MINIST手写数字识别的网络模型,该系统的DSP利用率达到了95%,在100 MHz时钟频率下,硬件单帧图像处理时间仅为0.81 ms,功耗仅为1.601 W。 展开更多
关键词 PYNQ ARM处理器 神经网络 现场可编程门阵列 硬件加速器
下载PDF
数控机床工作台DSP定位误差系统设计及分析
17
作者 路晓云 杨光 《机械管理开发》 2024年第3期187-188,191,共3页
为进一步优化数控机床对于测试误差的补偿功能,开发通过DSP硬件系统对误差进行准确预测并设置补偿措施。建立的定位误差模型预测补偿系统包含数控系统进给轴反馈结构、DSP建模预测系统以及数控系统。研究结果表明,采用Matlab软件运行得... 为进一步优化数控机床对于测试误差的补偿功能,开发通过DSP硬件系统对误差进行准确预测并设置补偿措施。建立的定位误差模型预测补偿系统包含数控系统进给轴反馈结构、DSP建模预测系统以及数控系统。研究结果表明,采用Matlab软件运行得到的优化权值与阈值建立的GA-BP网络进行误差预测共需251μs;采用GA-BP网络构建的模型进行预测时达到了更高精度。该研究有助于提高数控机床加工精度,对提高加工参数的优化起到很好的指导意义以及控制效果。 展开更多
关键词 数控机床 定位误差 数字信号处理器 遗传算法 反向传播网络
下载PDF
插损鲁棒性的全复值光学神经网络
18
作者 陈慧彬 汤凯飞 游振宇 《中国光学(中英文)》 EI CAS CSCD 北大核心 2024年第4期834-841,共8页
基于马赫-曾德尔干涉仪(Mach-Zehnder Interferometer,MZI)级联拓扑结构的线性光学处理器被证明是实现光学神经网络(Optical Neural Network,ONN)的重要途径,但还有不少实际问题有待解决。针对芯片制造、测试过程中可能导致的相位误差... 基于马赫-曾德尔干涉仪(Mach-Zehnder Interferometer,MZI)级联拓扑结构的线性光学处理器被证明是实现光学神经网络(Optical Neural Network,ONN)的重要途径,但还有不少实际问题有待解决。针对芯片制造、测试过程中可能导致的相位误差和插入损耗等问题,通过实验和理论仿真分析了几种基于MZI结构的可重构光学处理器。发现可以通过单个N×N的Clements阵列结构来实现任意酉矩阵的权重,构建稀疏连接的全复值光学神经网络,将光学深度大大降低,以实现较高的插入损耗鲁棒性。此外,对于多层光学神经网络来说,由于构建该任意酉矩阵的自由度有限,故在每一层Clements结构前面加一个相移器层,有助于将分类数据映射到更高的数据维度,能使神经网络更快速的收敛。 展开更多
关键词 光学神经网络 MZI阵列 可重构光学处理器
下载PDF
申威平台高速网络数据处理框架的设计与实现
19
作者 曹建军 佘平 聂世强 《计算机技术与发展》 2024年第7期184-191,共8页
随着大数据时代网络流量的激增,传统内核网络协议栈由于内核切换开销占比高等原因导致现有基于内核的网络数据处理系统无法充分利用10 Gb乃至100 Gb的高速网卡收发能力。为了降低内核切换开销,开源DPDK用户态网络开发套件被提出以支持... 随着大数据时代网络流量的激增,传统内核网络协议栈由于内核切换开销占比高等原因导致现有基于内核的网络数据处理系统无法充分利用10 Gb乃至100 Gb的高速网卡收发能力。为了降低内核切换开销,开源DPDK用户态网络开发套件被提出以支持高速网络流量处理,并在x86平台得到大规模应用和部署。为了满足国产化信创和网络安全的要求,面向国产申威处理器平台设计并实现了一套基于DPDK的网络流量组包解析框架,充分利用DPDK的大页内存、无锁队列等机制,设计多线程并行以发挥申威处理器多核性能,支持常见基于TCP/UDP的多种应用层协议解析,并具有轻量化和可扩展特点。基于真实硬件平台实验结果表明,该框架性能比现有主流软件提高10%左右,为基于国产处理器平台的高速网络数据处理做了初步探索。 展开更多
关键词 DPDK 协议分析 高速网络 TCP/IP协议栈 国产处理器
下载PDF
基于DSP的整流管组件故障诊断技术研究
20
作者 陈章恒 王啸 苑振宇 《微电机》 2024年第3期44-46,58,共4页
为了提高航空发电机整流管组件故障诊断系统的效果和可靠性,本文采用DSP芯片和神经网络方法设计了一个故障检测系统,可实现对发电机整流管组件整流管正常工作、单管断路和双管断路故障等三类模式的诊断。通过对实际航空发电机的数据进... 为了提高航空发电机整流管组件故障诊断系统的效果和可靠性,本文采用DSP芯片和神经网络方法设计了一个故障检测系统,可实现对发电机整流管组件整流管正常工作、单管断路和双管断路故障等三类模式的诊断。通过对实际航空发电机的数据进行分析,结果表明,对给定的数据,系统诊断正确率可达100%。 展开更多
关键词 整流管组件 故障诊断 数字信号处理器 反向传播神经网络
下载PDF
上一页 1 2 51 下一页 到第
使用帮助 返回顶部