期刊文献+

大规模稀疏矩阵的主特征向量计算优化方法 被引量:3

Optimization of Parallel Principal Eigenvectors Computing for Large-Scale Sparse Matrixes
下载PDF
导出
摘要 矩阵主特征向量(principal eigenvectors computing,PEC)的求解是科学与工程计算中的一个重要问题。随着图形处理单元通用计算(general-purpose computing on graphics pro cessing unit,GPGPU)的兴起,利用GPU来优化大规模稀疏矩阵的图形处理单元求解得到了广泛关注。分别从应用特征和GPU体系结构特征两方面分析了PEC运算的性能瓶颈,提出了一种面向GPU的稀疏矩阵存储格式——GPU-ELL和一个针对GPU的线程优化映射策略,并设计了相应的PEC优化执行算法。在ATI HD Radeon5850上的实验结果表明,相对于传统CPU,该方案获得了最多200倍左右的加速,相对于已有GPU上的实现,也获得了2倍的加速。 The principal eigenvectors computing (PEC) is a paramount operation in engineering and scientific computing. Since the general-purpose computing on graphics processing unit (GPGPU) emerges for the outstanding acceleration factors, PEC implementations on graphics processing unit (GPU) have appeared on the scene. This paper analyzes PEC performance bottleneck from the characteristic of application and GPU architecture, and thereforeproposes a new implementation of PEC based on a new matrix storage format, called GPU-ELL, and an optimized thread mapping strategy of GPU. It evaluates the proposed approach over ATI HD Radeon 5850 GPU, and the ex- perimental results show its good performance with average 200 times acceleration of other existing algorithm on CPU, and 2 times of that on GPU.
出处 《计算机科学与探索》 CSCD 2012年第2期118-124,共7页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金Nos.61103068 61174158 NSFC-微软亚洲研究院联合资助项目No.60970155 教育部博士点基金No.20090072110035 高等学校博士学科点专项科研基金No.20110072120017 上海市优秀学科带头人计划项目No.10XD1404400 高效能服务器和存储技术国家重点实验室开放基金No.2009HSSA06~~
关键词 图形处理单元通用计算(GPGPU) 主特征向量计算 稀疏矩阵向量乘 线程优化 general-purpose computing on graphics processing unit (GPGPU) principal eigenvectors computing (PEC) sparse matrix vector (SpMV) thread optimization
  • 相关文献

参考文献8

  • 1Owens J, Luebke D, Govindaraju N, et al. A survey of general-purpose computation on graphics hardware[J]. Computer Graphics Forum, 2007, 26(1): 80-113.
  • 2Kurzak J, Alvaro W, Dongarra J. Optimizing matrix mul- tiplication for a short-vector SIMD architecture-CELL processor[J]. Parallel Computing, 2009, 35(3): 138-150.
  • 3Kotakemori H, Hasegawa H, Kajiyama T, et al. Perform- ance evaluation of parallel sparse matrix-vector products on SGI Altix3700[C]//Proceedings of the 1st International Workshop on OpenMP (IWOMP), Eugene, OR, USA, June 2005. Berlin, Heidelberg: Springer-Verlag, 2005:153-163.
  • 4Vazquez F, Garzon E M, Martinez J A, et al. The sparse matrix vector product on GPUs[R]. University of Almeria, 2009.
  • 5Davis A T. The University of Florida sparse matrix col- lection[EB/OL]. (1994)[2011-00]. http://www.cise.ufl.edu/ research/sparse/matrices/.
  • 6Baskaran M, Bordawekar R. Optimizing sparse matrix- vector multiplication on GPUs RC24704[R]. IBM, 2008. Bell N, Garland M. Efficient sparse matrix-vector multi- plication on CUDA NVR-2008-004[R]. NVIDIA, 2008.
  • 7Langville A N, Meyer C D. A survey of eigenvector methods for Web information retrieval[J]. The SIAM Re- view, 2005, 47(1): 135-161.
  • 8Bell N, Garland M. Efficient sparse matrix-vector multi- plication on CUDA NVR-2008-004[R]. NVIDIA, 2008.

同被引文献42

  • 1徐京,陶皖.一种改进的PageRank算法[J].长江大学学报(自科版)(上旬),2013,10(10):51-53. 被引量:1
  • 2Amestoy P R,Davis T A,Duff I S.An approximate nminimmum degree ordering algorithm[J].SIAM Journal on Matrix Analysis and Applications,1996,17 (4):886-905.
  • 3Baskaran M M,Bordawekar R.Optimizing sparse matrix-vector multiplication on GPUs[R].Technical report IBM Research Report RC24704(W0812-047),2008.
  • 4Bell N,Garland M.Effcient sparse matrix-vector multiplication on cuda[R].NVIDIA Technical Report NVR-2008-004.Demcember 2008.
  • 5Shan Y,Wu T,Wang Y,et al.Fpga and gpu implementation of large scale spmv[C]// Proceedings of IEEE 8th Symposium on Application Specific Processors (SASP ' 10).Anaheim,California,USA,June 2010:67-70.
  • 6Monakov A,Lokhmotov A,Avetisyan A.Automatically tuning sparse matrix-vector multiplication for gpu architectures[C]//Proceedings of International Conference on High Performance and Embedded Architectures and Compilers (HiPEAC ' 10).2010:111-125.
  • 7V'azquez F,Ortega G,Fern' andez J,et al.Improving the performance of the sparse matrix vector product with gpus[C]//Proceedings of IEEE International Conference on Computer and Information Technology (CIT ' 10).Bradford,June 2010:1146-1151.
  • 8Baskaran M M,Bordaker R.Optimizing sparse matrix-vector multiplication on gpus[R].IBM Research Report RC24704(W0812 047).April 2009.
  • 9Dehnavi M M,Fern'andez D M,Giannacopoulos D.Finite-element sparse matrix vector multiplication on graphic processing units[J].IEEE Transactions on Magnetics,2010,46 (8):2982-2985.
  • 10Buatois L,Caumon G,L' evy B.Concurrent number cruncher:An efficient sparse linear solver on the gpu[C]// High Performance Computing and Communications(HPCC' 07).Springer-Verlag,2007,4782:358-371.

引证文献3

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部