系统中浮点乘累加PE的设计与实现

Design and Implementation of Floating-Point Multiply-Accumulate Processing Element under SMVM System

下载PDF

导出

摘要稀疏矩阵向量乘(Sparse Matrix-Vector Multiply,SMVM),形如Ab=x,在科学计算、信息检索、数据挖掘等领域中都是重要的计算核心之一。稀疏矩阵中非零元素的稀疏性,使得在微处理器上实现该类运算时,存在Cache缺失率高等问题,导致性能并不理想。针对该问题提出了基于FPGA实现SMVM运算系统的新思路,对系统功能进行了软硬件划分,并完成了系统中硬件浮点乘累加处理单元(ProcessingElement,PE)的设计与实现。目标器件为Virtex4LX60,工作频率达到123.6MHz。 Sparse Matrix-Vector Multiply,Ab=x,is one of the important kernels in scientific computatlon,text retrieval and data mining.The sparsity of non-zero elements in sparse matrix results in the high Cache miss ratio when implementing on micro-processors,so the performance is not ideal.This paper presents a novel architecture to realize SMVM system on FPGA ,the system functions are divided into software and hardware.This paper presents the design and implementation of floating point multiply accumulate processing element.The target device is Virtex4 LX60,and the working frequency is 123.6 MHz.

作者金席高小鹏龙翔

机构地区北京航空航天大学计算机学院

出处《计算机工程与应用》 CSCD 北大核心 2006年第35期107-109,共3页 Computer Engineering and Applications

关键词乘累加浮点稀疏矩阵向量乘 FPGA multiply-accumulate floating-point Sparse Matrix-Vector Multiply （SMVM） FPGA

分类号 TP36 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献8

1IM E J.Optimizing the performance of sparse matrix-vector multiplication[D].Berkeley:Computer science,University of California,2000.
2AMIRA.A high throughput FPGA implementation of a bit-level matrix-matrix product[C]//43rd IEEE Midwest Symposium on Circuits and Systems,2000.
3SCROFANO.Energy efficiency of FPGAs and programmable processors for matrix multiplication.University of Southern California,2002.
4ZHUO Ling,PRASANNA.Scalable and modular algorithms for floatingpoint matrix multiplication on FPGAs[C]//proceedings of the 18th International Parallel and Distributed Processing Symposium,2004.
5DOU Yong,VASSILIADIS.64-bit floating-point FPGA Matrix Multiplication[C]//ACM FPGA'05,2005.
6SAAD Y.SPARSKIT:a basic tool kit for sparse matrix computations[EB/OL].1994.http://www-users.cs.umn.edu/～saad/software/sparskit/.
7LEISERSON C.Optimizing the synchronous circuitry by retiming[C]//Third Caltech Conference on VLSI,March 1993.
8VUDUC.Performance optimizations and bounds for sparse matrix-vector multiply[C]//proceedings of IEEE/ACM Conference on Supercomputing,2002.

1金席,高小鹏,龙翔.浮点乘累加处理单元的FPGA实现[J].计算机与数字工程,2006,34(10):165-168. 被引量：5
2李世平,陈铠.基于FPGA的全流水浮点乘累加器的设计及实现[J].电子技术与软件工程,2016(2):140-142.
3吴铁彬,刘衡竹,杨惠,张剑锋,侯申.一种快速SIMD浮点乘加器的设计与实现[J].计算机工程与科学,2012,34(1):69-73. 被引量：5
4胡塘,刘文波,于盛林.FIR滤波器的一种新型设计方法[J].现代电子技术,2004,27(21):32-33. 被引量：3
5陈爽,陈雷,孙国欣,刘闪,刘茂华,辛向利.32位DSP乘法器分析与设计[J].电子工程师,2007,33(11):49-51. 被引量：1
6周剑,张明新.无线传感器网络数据的相关性自适应压缩感知[J].计算机应用,2013,33(2):374-377. 被引量：5
7周泉,曹辉,闫博,杨靓.高性能图像匹配电路乘累加性能分析[J].微电子学与计算机,2014,31(8):56-60.
8钱艺,王沁,吴巍,刘金龙.神经网络并行MIMD处理器的研究及实现[J].电子科技大学学报,2008,37(6):904-907.
9国静,李良荣.串并分布式算法的研究及其实现[J].科技信息,2009(2):73-74.
10雷元武,窦勇,郭松,李鑫,雷国庆.基于高精度乘累加的LU分解加速器的设计[J].计算机工程与科学,2009,31(11):33-36. 被引量：2

计算机工程与应用

2006年第35期

浏览历史

内容加载中请稍等...

系统中浮点乘累加PE的设计与实现

参考文献8

相关作者

相关机构

相关主题

浏览历史