期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
Design and implementation of near-memory computing array architecture based on shared buffer 被引量:1
1
作者 SHAN Rui GAO Xu +3 位作者 FENG Yani HUI Chao CUI Xinyue CHAI Miaomiao 《High Technology Letters》 EI CAS 2022年第4期345-353,共9页
Deep learning algorithms have been widely used in computer vision,natural language processing and other fields.However,due to the ever-increasing scale of the deep learning model,the requirements for storage and compu... Deep learning algorithms have been widely used in computer vision,natural language processing and other fields.However,due to the ever-increasing scale of the deep learning model,the requirements for storage and computing performance are getting higher and higher,and the processors based on the von Neumann architecture have gradually exposed significant shortcomings such as consumption and long latency.In order to alleviate this problem,large-scale processing systems are shifting from a traditional computing-centric model to a data-centric model.A near-memory computing array architecture based on the shared buffer is proposed in this paper to improve system performance,which supports instructions with the characteristics of store-calculation integration,reducing the data movement between the processor and main memory.Through data reuse,the processing speed of the algorithm is further improved.The proposed architecture is verified and tested through the parallel realization of the convolutional neural network(CNN)algorithm.The experimental results show that at the frequency of 110 MHz,the calculation speed of a single convolution operation is increased by 66.64%on average compared with the CNN architecture that performs parallel calculations on field programmable gate array(FPGA).The processing speed of the whole convolution layer is improved by 8.81%compared with the reconfigurable array processor that does not support near-memory computing. 展开更多
关键词 near-memory computing shared buffer reconfigurable array processor convolutional neural network(CNN)
下载PDF
Efficient fine-grained shared buffer management for multiple OpenCL devices
2
作者 Chang-qing XUN Dong CHEN +1 位作者 Qiang LAN Chun-yuan ZHANG 《Journal of Zhejiang University-Science C(Computers and Electronics)》 SCIE EI 2013年第11期859-872,共14页
OpenCL programming provides full code portability between different hardware platforms,and can serve as a good programming candidate for heterogeneous systems,which typically consist of a host processor and several ac... OpenCL programming provides full code portability between different hardware platforms,and can serve as a good programming candidate for heterogeneous systems,which typically consist of a host processor and several accelerators.However,to make full use of the computing capacity of such a system,programmers are requested to manage diverse OpenCL-enabled devices explicitly,including distributing the workload between different devices and managing data transfer between multiple devices.All these tedious jobs pose a huge challenge for programmers.In this paper,a distributed shared OpenCL memory(DSOM) is presented,which relieves users of having to manage data transfer explicitly,by supporting shared buffers across devices.DSOM allocates shared buffers in the system memory and treats the on-device memory as a software managed virtual cache buffer.To support fine-grained shared buffer management,we designed a kernel parser in DSOM for buffer access range analysis.A basic modified,shared,invalid cache coherency is implemented for DSOM to maintain coherency for cache buffers.In addition,we propose a novel strategy to minimize communication cost between devices by launching each necessary data transfer as early as possible.This strategy enables overlap of data transfer with kernel execution.Our experimental results show that the applicability of our method for buffer access range analysis is good,and the efficiency of DSOM is high. 展开更多
关键词 shared buffer OPENCL Heterogeneous programming Fine grained
原文传递
A Shared Buffer Memory ATM Access Switch 被引量:1
3
作者 YuHao ZhuXinning 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 1998年第1期34-38,43,共6页
This paper proposes a Shared Buffer Memory ATM Access Switch . The switches have significant benefits over Crossbar or Bus Based switches because its output buffer memories are shared by all the switch output ports an... This paper proposes a Shared Buffer Memory ATM Access Switch . The switches have significant benefits over Crossbar or Bus Based switches because its output buffer memories are shared by all the switch output ports and are allotted to one particular output port as the occasion demands. As Buffer allocation schemes in the ATM Switches is Partial Sharing, it is trade-off between Complete Sharing and Dedicated Allocation. In addition, the queuing structures used in the shared memory are independent of both the data path through the switch and the cell scheduling mechanism. The method for queue management is simple and effective. 展开更多
关键词 ATM switch shared buffer memory partial sharing queue management
原文传递
Performance Analysis of Wavelength Division Multiplexing Asynchronous Internet Router Employing Space Priority Mechanism under Self-Similar Traffic Input—Multi-Server Queueing System with Markovian Input and Erlang-k Services
4
作者 Ravi Kumar Gudimalla Malla Reddy Perati 《Applied Mathematics》 2016年第15期1707-1725,共19页
In this paper, we analyze the queueing behaviour of wavelength division multiplexing (WDM) Internet router employing partial buffer sharing (PBS) mechanism with self-similar traffic input. In view of WDM technology in... In this paper, we analyze the queueing behaviour of wavelength division multiplexing (WDM) Internet router employing partial buffer sharing (PBS) mechanism with self-similar traffic input. In view of WDM technology in networking, each output port of the router is modelled as multi-server queueing system. To guarantee the quality of service (QoS) in Broadband integrated services digital network (B-ISDN), PBS mechanism is a promising one. As Markov modulated Poisson process (MMPP) emulates self-similar Internet traffic, we can use MMPP as input process of queueing system to investigate queueing behaviour of the router. In general, as network traffic is asynchronous (unslotted) and of variable packet lengths, service times (packet lengths) are assumed to follow Erlang-k distribution. Since, the said distribution is relatively general compared to deterministic and exponential. Hence, specific output port of the router is modelled as MMPP/Ek/s/C queueing system. The long-term performance measures namely high priority and low priority packet loss probabilities and the short-term performance measures namely mean lengths of critical and non-critical periods against the system parameters and traffic parameters are computed by means of matrix-geometric methods and approximate Markovian model. This kind of analysis is useful in dimensioning the router under self-similar traffic input employing PBS mechanism to provide differentiated services (DiffServ) and QoS guarantee. 展开更多
关键词 WDM Internet Router SELF-SIMILARITY Partial buffer Sharing Erlang-k Service Packet Loss Probability
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部