期刊文献+
共找到5篇文章
< 1 >
每页显示 20 50 100
Design and implementation of near-memory computing array architecture based on shared buffer 被引量:1
1
作者 SHAN Rui GAO Xu +3 位作者 FENG Yani HUI Chao CUI Xinyue CHAI Miaomiao 《High Technology Letters》 EI CAS 2022年第4期345-353,共9页
Deep learning algorithms have been widely used in computer vision,natural language processing and other fields.However,due to the ever-increasing scale of the deep learning model,the requirements for storage and compu... Deep learning algorithms have been widely used in computer vision,natural language processing and other fields.However,due to the ever-increasing scale of the deep learning model,the requirements for storage and computing performance are getting higher and higher,and the processors based on the von Neumann architecture have gradually exposed significant shortcomings such as consumption and long latency.In order to alleviate this problem,large-scale processing systems are shifting from a traditional computing-centric model to a data-centric model.A near-memory computing array architecture based on the shared buffer is proposed in this paper to improve system performance,which supports instructions with the characteristics of store-calculation integration,reducing the data movement between the processor and main memory.Through data reuse,the processing speed of the algorithm is further improved.The proposed architecture is verified and tested through the parallel realization of the convolutional neural network(CNN)algorithm.The experimental results show that at the frequency of 110 MHz,the calculation speed of a single convolution operation is increased by 66.64%on average compared with the CNN architecture that performs parallel calculations on field programmable gate array(FPGA).The processing speed of the whole convolution layer is improved by 8.81%compared with the reconfigurable array processor that does not support near-memory computing. 展开更多
关键词 near-memory computing shared buffer reconfigurable array processor convolutional neural network(CNN)
下载PDF
存内计算芯片研究进展及应用 被引量:3
2
作者 郭昕婕 王光燿 王绍迪 《电子与信息学报》 EI CSCD 北大核心 2023年第5期1888-1898,共11页
随着数据快速增长,冯诺依曼架构内存墙成为计算性能进一步提升的关键瓶颈。新型存算一体架构(包括存内计算(IMC)架构与近存计算(NMC)架构),有望打破冯诺依曼架构瓶颈,大幅提高算力和能效。该文介绍了存算一体芯片的发展历程、研究现状... 随着数据快速增长,冯诺依曼架构内存墙成为计算性能进一步提升的关键瓶颈。新型存算一体架构(包括存内计算(IMC)架构与近存计算(NMC)架构),有望打破冯诺依曼架构瓶颈,大幅提高算力和能效。该文介绍了存算一体芯片的发展历程、研究现状以及基于各类存储器介质(如传统存储器DRAM,SRAM和Flash和新型非易失性存储器ReRAM,PCM,MRAM,FeFET等)的存内计算基本原理、优势与面临的问题。然后,以知存科技WTM2101量产芯片为例,重点介绍了存算一体芯片的电路结构与应用现状。最后,分析了存算一体芯片未来的发展前景与面临的挑战。 展开更多
关键词 存算一体 存储墙 功耗墙 存内计算 近存计算 冯诺依曼架构瓶颈
下载PDF
Non-volatile programmable homogeneous lateral MoTe2 junction for multi-bit flash memory and high-performance optoelectronics 被引量:1
3
作者 Enxiu Wu Yuan Xie +3 位作者 Shijie Wang Daihua Zhang Xiaodong Hu Jing Liu 《Nano Research》 SCIE EI CAS CSCD 2020年第12期3445-3451,共7页
Flash memories and semiconductor p-n junctions are two elementary but incompatible building blocks of most electronic and optoelectronic devices.The pressing demand to efficiently transfer massive data between memorie... Flash memories and semiconductor p-n junctions are two elementary but incompatible building blocks of most electronic and optoelectronic devices.The pressing demand to efficiently transfer massive data between memories and logic circuits,as well as for high data storage capability and device integration density,has fueled the rapid growth of technique and material innovations.Two-dimensional(2D)materials are considered as one of the most promising candidates to solve this challenge.However,a key aspect for 2D materials to build functional devices requires effective and accurate control of the carrier polarity,concentration and spatial distribution in the atomically thin structures.Here,a non-volatile opto-electrical doping approach is demonstrated,which enables reversibly writing spatially resolved doping patterns in the MoTe2 conductance channel through a MoTe2/hexagonal boron nitride(h-BN)heterostructure.Based on the doping effect induced by the combination of electrostatic modulation and ultraviolet light illumination,a 3-bit flash memory and various homojunctions on the same MoTe2/BN heterostructure are successfully developed.The flash memory achieved 8 well distinguished memory states with a maximum on/off ratio over 10^4.Each state showed negligible decay during the retention time of 2,400 s.The heterostructure also allowed the formation of p-p,n-n,p-n,and n-p homojunctions and the free transition among these states.The MoTe2 p-n homojunction with a rectification ratio of 10^3 exhibited excellent photodetection and photovoltaic performance.Having the memory device and p-n junction built on the same structure makes it possible to bring memory and computational circuit on the same chip,one step further to realize near-memory computing. 展开更多
关键词 3-bit flash memory p-n homojunctions MoTe2 opto-electrical doping near-memory computing photovoltaic
原文传递
PIM-Align: A Processing-in-Memory Architecture for FM-IndexSearch Algorithm
4
作者 Xue-Qi Li Guang-Ming Tan Ning-Hui Sun 《Journal of Computer Science & Technology》 SCIE EI CSCD 2021年第1期56-70,共15页
Genomic sequence alignment is the most critical and time-consuming step in genomic analysis.Alignment algorithms generally follow a seed-and-extend model.Acceleration of the extension phase for sequence alignment has ... Genomic sequence alignment is the most critical and time-consuming step in genomic analysis.Alignment algorithms generally follow a seed-and-extend model.Acceleration of the extension phase for sequence alignment has been well explored in computing-centric architectures on field-programmable gate array(FPGA),application-specific integrated circuit(ASIC),and graphics processing unit(GPU)(e.g.,the Smith-Waterman algorithm).Compared with the extension phase,the seeding phase is more critical and essential.However,the seeding phase is bounded by memory,i.e.,fine-grained random memory access and limited parallelism on conventional system.In this paper,we argue that the processing-in-memory(PIM)concept could be a viable solution to address these problems.This paper describes\PIM-Align"|an application-driven near-data processing architecture for sequence alignment.In order to achieve memory-capacity proportional performance by taking advantage of 3D-stacked dynamic random access memory(DRAM)technology,we propose a lightweight message mechanism between different memory partitions,and a specialized hardware prefetcher for memory access patterns of sequence alignment.Our evaluation shows that the proposed architecture can achieve 20x and 1820x speedup when compared with the best available ASIC implementation and the software running on 32-thread CPU,respectively. 展开更多
关键词 accelerator design genomic sequence alignment near-memory computing
原文传递
基于近存储计算的手写数字识别实时检测阵列结构设计
5
作者 霍紫晴 山蕊 +2 位作者 冯雅妮 高旭 冯煜 《光电子.激光》 CAS CSCD 北大核心 2022年第12期1315-1322,共8页
卷积神经网络(convolutional neural network, CNN)作为传统神经网络的改进,已经得到了广泛的应用。然而,在CNN性能提升的同时其模型的规模不断扩大,对存储及算力的要求越来越高,基于冯·诺依曼体系结构的处理器难以达到令人满意的... 卷积神经网络(convolutional neural network, CNN)作为传统神经网络的改进,已经得到了广泛的应用。然而,在CNN性能提升的同时其模型的规模不断扩大,对存储及算力的要求越来越高,基于冯·诺依曼体系结构的处理器难以达到令人满意的高处理性能。为了提升系统性能,近存储计算(near memory computing, NMC)成为了一个具有发展前景的研究方向。本文利用一种支持NMC的可重构阵列处理器实现手写数字识别,并行地实现了卷积运算;同时利用共享缓存阵列结构,减少片外存储的频繁访问。实验结果表明,在110 MHz的工作频率下,执行单个5×5卷积运算的计算速度提升了75.00%,可以在9 960μs内实现一个手写数字的识别。 展开更多
关键词 卷积神经网络(CNN) 手写数字识别 可重构阵列处理器 近存储计算(nmc) 共享缓存阵列
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部