大规模稀疏矩阵的主特征向量计算优化方法被引量：3

Optimization of Parallel Principal Eigenvectors Computing for Large-Scale Sparse Matrixes

下载PDF

导出

摘要矩阵主特征向量(principal eigenvectors computing,PEC)的求解是科学与工程计算中的一个重要问题。随着图形处理单元通用计算(general-purpose computing on graphics pro cessing unit,GPGPU)的兴起,利用GPU来优化大规模稀疏矩阵的图形处理单元求解得到了广泛关注。分别从应用特征和GPU体系结构特征两方面分析了PEC运算的性能瓶颈,提出了一种面向GPU的稀疏矩阵存储格式——GPU-ELL和一个针对GPU的线程优化映射策略,并设计了相应的PEC优化执行算法。在ATI HD Radeon5850上的实验结果表明,相对于传统CPU,该方案获得了最多200倍左右的加速,相对于已有GPU上的实现,也获得了2倍的加速。 The principal eigenvectors computing （PEC） is a paramount operation in engineering and scientific computing. Since the general-purpose computing on graphics processing unit （GPGPU） emerges for the outstanding acceleration factors, PEC implementations on graphics processing unit （GPU） have appeared on the scene. This paper analyzes PEC performance bottleneck from the characteristic of application and GPU architecture, and thereforeproposes a new implementation of PEC based on a new matrix storage format, called GPU-ELL, and an optimized thread mapping strategy of GPU. It evaluates the proposed approach over ATI HD Radeon 5850 GPU, and the ex- perimental results show its good performance with average 200 times acceleration of other existing algorithm on CPU, and 2 times of that on GPU.

作者王伟陈建平曾国荪俞莉花谭一鸣

机构地区同济大学计算机科学与技术系国家高性能计算机工程技术中心同济分中心同济大学嵌入式系统与服务计算教育部重点实验室

出处《计算机科学与探索》 CSCD 2012年第2期118-124,共7页 Journal of Frontiers of Computer Science and Technology

基金国家自然科学基金Nos.61103068 61174158 NSFC-微软亚洲研究院联合资助项目No.60970155 教育部博士点基金No.20090072110035 高等学校博士学科点专项科研基金No.20110072120017 上海市优秀学科带头人计划项目No.10XD1404400 高效能服务器和存储技术国家重点实验室开放基金No.2009HSSA06~~

关键词图形处理单元通用计算(GPGPU) 主特征向量计算稀疏矩阵向量乘线程优化 general-purpose computing on graphics processing unit （GPGPU） principal eigenvectors computing （PEC） sparse matrix vector （SpMV） thread optimization

分类号 TP301 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献8

1Owens J, Luebke D, Govindaraju N, et al. A survey of general-purpose computation on graphics hardware[J]. Computer Graphics Forum, 2007, 26(1): 80-113.
2Kurzak J, Alvaro W, Dongarra J. Optimizing matrix mul- tiplication for a short-vector SIMD architecture-CELL processor[J]. Parallel Computing, 2009, 35(3): 138-150.
3Kotakemori H, Hasegawa H, Kajiyama T, et al. Perform- ance evaluation of parallel sparse matrix-vector products on SGI Altix3700[C]//Proceedings of the 1st International Workshop on OpenMP (IWOMP), Eugene, OR, USA, June 2005. Berlin, Heidelberg: Springer-Verlag, 2005:153-163.
4Vazquez F, Garzon E M, Martinez J A, et al. The sparse matrix vector product on GPUs[R]. University of Almeria, 2009.
5Davis A T. The University of Florida sparse matrix col- lection[EB/OL]. (1994)[2011-00]. http://www.cise.ufl.edu/ research/sparse/matrices/.
6Baskaran M, Bordawekar R. Optimizing sparse matrix- vector multiplication on GPUs RC24704[R]. IBM, 2008. Bell N, Garland M. Efficient sparse matrix-vector multi- plication on CUDA NVR-2008-004[R]. NVIDIA, 2008.
7Langville A N, Meyer C D. A survey of eigenvector methods for Web information retrieval[J]. The SIAM Re- view, 2005, 47(1): 135-161.
8Bell N, Garland M. Efficient sparse matrix-vector multi- plication on CUDA NVR-2008-004[R]. NVIDIA, 2008.

同被引文献42

1徐京,陶皖.一种改进的PageRank算法[J].长江大学学报（自科版）（上旬）,2013,10(10):51-53. 被引量：1
2Amestoy P R,Davis T A,Duff I S.An approximate nminimmum degree ordering algorithm[J].SIAM Journal on Matrix Analysis and Applications,1996,17 (4):886-905.
3Baskaran M M,Bordawekar R.Optimizing sparse matrix-vector multiplication on GPUs[R].Technical report IBM Research Report RC24704(W0812-047),2008.
4Bell N,Garland M.Effcient sparse matrix-vector multiplication on cuda[R].NVIDIA Technical Report NVR-2008-004.Demcember 2008.
5Shan Y,Wu T,Wang Y,et al.Fpga and gpu implementation of large scale spmv[C]// Proceedings of IEEE 8th Symposium on Application Specific Processors (SASP ' 10).Anaheim,California,USA,June 2010:67-70.
6Monakov A,Lokhmotov A,Avetisyan A.Automatically tuning sparse matrix-vector multiplication for gpu architectures[C]//Proceedings of International Conference on High Performance and Embedded Architectures and Compilers (HiPEAC ' 10).2010:111-125.
7V'azquez F,Ortega G,Fern' andez J,et al.Improving the performance of the sparse matrix vector product with gpus[C]//Proceedings of IEEE International Conference on Computer and Information Technology (CIT ' 10).Bradford,June 2010:1146-1151.
8Baskaran M M,Bordaker R.Optimizing sparse matrix-vector multiplication on gpus[R].IBM Research Report RC24704(W0812 047).April 2009.
9Dehnavi M M,Fern'andez D M,Giannacopoulos D.Finite-element sparse matrix vector multiplication on graphic processing units[J].IEEE Transactions on Magnetics,2010,46 (8):2982-2985.
10Buatois L,Caumon G,L' evy B.Concurrent number cruncher:An efficient sparse linear solver on the gpu[C]// High Performance Computing and Communications(HPCC' 07).Springer-Verlag,2007,4782:358-371.

引证文献3

1阳王东,李肯立,石林.一种准对角矩阵的混合压缩算法及其与向量相乘在GPU上的实现[J].计算机科学,2014,41(7):290-296. 被引量：5
2阳王东,李肯立.基于HYB格式稀疏矩阵与向量乘在CPU+GPU异构系统中的实现与优化[J].计算机工程与科学,2016,38(2):202-209. 被引量：7
3陈广胜,李思阳,张凡,李丹.基于林业主题的PageRank算法优化的研究[J].黑龙江大学自然科学学报,2016,33(4):533-538. 被引量：2

二级引证文献12

1阳王东,李肯立.基于HYB格式稀疏矩阵与向量乘在CPU+GPU异构系统中的实现与优化[J].计算机工程与科学,2016,38(2):202-209. 被引量：7
2程凯,田瑾,马瑞琳.基于GPU的高效稀疏矩阵存储格式研究[J].计算机工程,2018,44(8):54-60. 被引量：8
3王永胜.CPU+GPU的异构计算系统在石油勘探中的应用研究[J].电脑知识与技术（过刊）,2017,23(10X):250-251. 被引量：1
4杨淑丹,董方敏.电力系统潮流并行计算中的方程组求解方法[J].计算机与数字工程,2018,46(4):649-654. 被引量：2
5曹亚松,刘胜.面向稀疏矩阵向量乘的DMA设计与验证[J].计算机与数字工程,2019,47(11):2686-2690.
6李亿渊,薛巍,陈德训,王欣亮,许平,张武生,杨广文.稀疏矩阵向量乘法在申威众核架构上的性能优化[J].计算机学报,2020,43(6):1037-1051. 被引量：10
7顾越,赵银亮.基于RISC-V向量指令的稀疏矩阵向量乘法实现与优化[J].计算机工程与科学,2022,44(1):1-8. 被引量：4
8蔺丽华,张美春,王佳仪,李敏,门浩.基于BWDSP1042的复数矩阵向量乘的优化与实现[J].计算机应用与软件,2023,40(3):298-301.
9王鑫,彭健.基于HYB格式SpMV在新一代申威架构上的实现与优化[J].计算机工程与科学,2023,45(10):1754-1762.
10王宇华,何俊飞,张宇琪,徐悦竹,崔环宇.DRM:基于迭代归并策略的GPU并行SpMV存储格式[J].计算机工程与科学,2024,46(3):381-394.

1李洋,乐晓波.克隆选择算法在优化模糊Petri网参数中的应用[J].计算机工程与应用,2011,47(15):39-42. 被引量：1
2张宏立.利用EXCEL快速生成MATLAB大规模稀疏矩阵[J].新疆大学学报（自然科学版）,2005,22(1):102-104. 被引量：2
3孙光明,王硕.基于项目兴趣度的协同过滤新算法[J].计算机应用研究,2013,30(12):3618-3621. 被引量：16
4吴洋,赵永华,纪国良.一类大规模稀疏矩阵特征问题求解的并行算法[J].数值计算与计算机应用,2013,34(2):136-146. 被引量：5
5邹承明,侯小碧,马静.基于几何学图像配准的SIFT图像拼接算法[J].华中科技大学学报（自然科学版）,2016,44(4):32-36. 被引量：15
6李洋,乐晓波.蚁群算法在模糊Petri网参数优化中的应用[J].计算机应用,2007,27(3):638-641. 被引量：11
7肖明魁.Python语言多进程与多线程设计探究[J].计算机光盘软件与应用,2014,17(15):66-67. 被引量：9
8丁祖萍,刘坤,王成.一种基于颜色和纹理的显著性目标检测算法[J].计算机工程与应用,2016,52(19):192-195. 被引量：10
9高健,黄心汉,彭刚,王敏,吴祖玉.一种简化的SIFT图像特征点提取算法[J].计算机应用研究,2008,25(7):2213-2215. 被引量：31
10范玉强,龙慧云,吴云.K-means算法在隐语义模型中的应用[J].计算机与数字工程,2016,44(4):572-574. 被引量：1

计算机科学与探索

2012年第2期

浏览历史

内容加载中请稍等...

大规模稀疏矩阵的主特征向量计算优化方法被引量：3

参考文献8

同被引文献42

引证文献3

二级引证文献12

相关作者

相关机构

相关主题

浏览历史

大规模稀疏矩阵的主特征向量计算优化方法 被引量：3

参考文献8

同被引文献42

引证文献3

二级引证文献12

相关作者

相关机构

相关主题

浏览历史

大规模稀疏矩阵的主特征向量计算优化方法被引量：3