期刊文献+
共找到5篇文章
< 1 >
每页显示 20 50 100
WinoNet:Reconfigurable look-up table-based Winograd accelerator for arbitrary precision convolutional neural network inference
1
作者 Wang Chengcheng Li He +3 位作者 Cao Yanpeng Song Changjun Yu Feng Tang Yongming 《Journal of Southeast University(English Edition)》 EI CAS 2022年第4期332-339,共8页
To solve the hardware deployment problem caused by the vast demanding computational complexity of convolutional layers and limited hardware resources for the hardware network inference,a look-up table(LUT)-based convo... To solve the hardware deployment problem caused by the vast demanding computational complexity of convolutional layers and limited hardware resources for the hardware network inference,a look-up table(LUT)-based convolution architecture built on a field-programmable gate array using integer multipliers and addition trees is used.With the help of the Winograd algorithm,the optimization of convolution and multiplication is realized to reduce the computational complexity.The LUT-based operator is further optimized to construct a processing unit(PE).Simultaneously optimized storage streams improve memory access efficiency and solve bandwidth constraints.The data toggle rate is reduced to optimize power consumption.The experimental results show that the use of the Winograd algorithm to build basic processing units can significantly reduce the number of multipliers and achieve hardware deployment acceleration,while the time-division multiplexing of processing units improves resource utilization.Under this experimental condition,compared with the traditional convolution method,the architecture optimizes computing resources by 2.25 times and improves the peak throughput by 19.3 times.The LUT-based Winograd accelerator can effectively solve the deployment problem caused by limited hardware resources. 展开更多
关键词 quantized neural networks look-up table(lut)-based multiplier Winograd algorithm arbitrary precision
下载PDF
可用于HDTV解码器的基于ROM查找表的IDCT电路大规模集成电路实现
2
作者 杨宇红 张文军 胡力 《上海交通大学学报》 EI CAS CSCD 北大核心 2006年第1期24-27,共4页
介绍了采用基于ROM查找表的全数字反离散余弦变换(IDCT)电路的算法原理及其并行架构的大规模集成电路实现.首先将二维IDCT转换为两个一维IDCT变换,根据蝶形算法进一步转换为矩阵的乘加运算.通过将连续输入的一个块的奇列或偶列的4个数... 介绍了采用基于ROM查找表的全数字反离散余弦变换(IDCT)电路的算法原理及其并行架构的大规模集成电路实现.首先将二维IDCT转换为两个一维IDCT变换,根据蝶形算法进一步转换为矩阵的乘加运算.通过将连续输入的一个块的奇列或偶列的4个数据进行数据位重排,即将4个数据中相同的位组合在一起,则可用一个ROM查找表实现不同位的乘加运算.避免了硬件上的乘法器开销,具有很高的实现效率并节省硬件资源面积,因此可用于HDTV的实时解码器中,有助于降低电路的功耗.该电路已用于已开发的MPEG-2 MP@HL高清解码芯片,采用0.18μmCMOS工艺成功进行了流片. 展开更多
关键词 反离散余弦变换 rom查找袁 HDTV解码器 大规模集成电路
下载PDF
Area and Speed Efficient Implementation of Symmetric FIR Digital Filter through Reduced Parallel LUT Decomposed DA Approach 被引量:1
3
作者 S. C. Prasanna S. P. Joy Vasantha Rani 《Circuits and Systems》 2016年第8期1379-1391,共13页
This brief proposes an area and speed efficient implementation of symmetric finite impulse response (FIR) digital filter using reduced parallel look-up table (LUT) distributed arithmetic (DA) based approach. The compl... This brief proposes an area and speed efficient implementation of symmetric finite impulse response (FIR) digital filter using reduced parallel look-up table (LUT) distributed arithmetic (DA) based approach. The complexity lying in the realization of FIR filter is dominated by the multiplier structure. This complexity grows further with filter order, which results in increased area, power, and reduced speed of operation. The speed of operation is improved over multiply-accumulate approach using multiplier less conventional DA based design and decomposed DA based design. Both the structure requires B clock cycles to get the filter output for the input width of B, which limits the speed of DA structure. This limitation is addressed using parallel LUTs, called high speed DA FIR, at the expense of additional hardware cost. With large number of taps, the number of LUTs and its size also becomes large. In the proposed method, by exploiting coefficient symmetry property, the number of LUTs in the decomposed DA form is reduced by a factor of about 2. This proposed approach is applied in high speed DA based FIR design, to obtain area and speed efficient structure. The proposed design offers around 40% less area and 53.98% less slice-delay product (SDP) than the high throughput DA based structure when it’s implemented over Xilinx Virtex-5 FPGA device-XC5VSX95T-1FF1136 for 16-tap symmetric FIR filter. The proposed design on the same FPGA device, supports up to 607 MHz input sampling frequency, and offers 60.5% more speed and 67.71% less SDP than the systolic DA based design. 展开更多
关键词 Distributed Arithmetic Field Programmable Gate Array (FPGA) Finite-Impulse Response (FIR) Filter High Speed Reduced look-up table (lut)
下载PDF
PERFORMANCE IMPROVEMENT FOR A WCDMA RADIO OVER FIBER SYSTEM USING DIGITAL PRE-DISTORTER 被引量:1
4
作者 Ying Xiangyue Xu Tiefeng +1 位作者 Liu Taijun Nie Qiuhua 《Journal of Electronics(China)》 2012年第1期27-32,共6页
In this paper,a Radio Over Fiber (ROF) system with a Digital Pre-Distorter (DPD) for WCDMA signal transmission is investigated.A Look-Up Table (LUT) based DPD and a Memory Polynomial (MP) DPD are applied in the ROF li... In this paper,a Radio Over Fiber (ROF) system with a Digital Pre-Distorter (DPD) for WCDMA signal transmission is investigated.A Look-Up Table (LUT) based DPD and a Memory Polynomial (MP) DPD are applied in the ROF link so as to suppress the out-of-band spurious spectrum and improve the transmission performance.The experimental results show that the out-of-band emission due to existence of the third-order Inter-Modulation Distortion (IMD3) is obviously sup-pressed by these two DPD.An Adjacent Channel Power Ratio (ACPR) improvement of 8 dB is ob-tained for a single-carrier WCDMA signal transmission.These two DPD have equal ability in lin-earization of the ROF system for a three-carrier WCDMA signal transmission.There is no apparent memory effects exist in the ROF link. 展开更多
关键词 Radio Over Fiber (ROF) Digital Pre-Distorter (DPD) look-up table (lut) Memory Polynomial (MP) WCDMA
下载PDF
The Real Time Mixing Module Design for HDTV Data of SMPTE 274M and PC Video Data
5
作者 魏江力 赵保军 韩月秋 《Journal of Beijing Institute of Technology》 EI CAS 2003年第4期416-419,共4页
A real time mixing module for high definition television (HDTV) data of SMPTE 274M and PC video data is designed. The hardware implementation, algorithm and simulation of the mixing module are given. In order to impro... A real time mixing module for high definition television (HDTV) data of SMPTE 274M and PC video data is designed. The hardware implementation, algorithm and simulation of the mixing module are given. In order to improve the capability of data processing, an anti-fuse FPGA chip and a mechanism of pipelining and modularization are adopted. With 6 parallel LUTs and a fast algorithm, it can mix 4∶2∶2 component signals in luminance and chrominance space respectively in real time. According to the simulation, the module has the ability to mix the uncompressed HDTV data with PC video data in real time, which can not be fulfilled by current ASIC chips. Furthermore, it can be extended to multi-stage mixing with the thoughts implied by the design. The mixing module can be widely used in HDTV production systems. 展开更多
关键词 high definition television(HDTV) production system look-up table(lut) video mixing
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部