期刊文献+

高度可伸缩的稀疏矩阵乘法 被引量:3

Highly Scalable Sparse Matrix Multiplication
下载PDF
导出
摘要 矩阵乘法是线性代数和图算法中非常重要的一个基本操作,而大规模数据处理中的矩阵往往是稀疏矩阵。MapReduce编程框架能够有效地支持海量数据的分布式计算。因此,对如何运用MapReduce编程框架实现超大规模稀疏矩阵的乘法进行了研究。传统矩阵乘法并行算法没有针对稀疏矩阵进行专门优化,导致计算过程中出现大量不必要的通信开销。提出了一种新的算法——CRM(column row multiplication)算法,并与传统的矩阵分块算法进行了比较。实验证明,CRM算法运行效率有很大的提高,并且具有高度的可伸缩性,适合在MapReduce平台上运行。 Matrix multiplication is an important fundamental operation in algebra and graph algorithms. And matrixes are usually highly sparse when coming to massive data processing. MapReduce is a programming model which can process large data sets effectively. This paper focuses on how to deal with massive sparse matrix multiplication on top of MapReduce programming model. Block based matrix multiplication algorithms aren' t optimized for sparse matrix and produce large amount of redundant communication. This paper proposes a new algorithm named CRM (column row multiplication), and compares it with traditional block based matrix algorithms. The experimental results demonstrate that CRM has higher efficiency and scalability, is suitable for operating on MapReduce and out- performs traditional ways considerably.
出处 《计算机科学与探索》 CSCD 2013年第11期973-982,共10页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金~~
关键词 稀疏矩阵乘法 分布式计算 HADOOP sparse matrix multiplication distributed computing Hadoop
  • 相关文献

参考文献15

  • 1Shah V B, Adviser-Gilbert J R. An interactive system for combinatorial scientific computing with an emphasis on programmer productivity[M]. Santa Barbara: University of California at Santa Barbara, 2007.
  • 2Rabin M 0, Vazirani V V. Maximum matchings in general graphs through randomization[J]. Journal of Algorithms, 1989, 10(4): 557-567.
  • 3Yuster R, Zwick U. Detecting short directed cycles using rectangular matrix multiplication and dynamic program?ming[C]/lProceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA '04). Philadel?phia, PA, USA: Society for Industrial and Applied Mathe?matics, 2004: 254-260.
  • 4Zhang Yudong, Wu Lenan, Wei Geng, et al. A novel algo?rithm for all pairs shortest path problem based on matrix multiplication and pulse coupled neural network[J]. Digital Signal Processing, 2011, 21(4): 517-521.
  • 5Qian Zhengping, Chen Xiuwei, Kang Nanxi, et al. Mad?LINQ: large-scale distributed matrix computation for the cloud[C]/lProceedings of the 7th ACM European Confer?ence on Computer Systems (EuroSys '12), Bern, Switzer?land, 2012: 197-210.
  • 6Loos S M, Wise D S. Strassen' s matrix multiplication rela?beled[EB/OL]. (2009)[20 13-03]. http://src.acm.org/loos/loos. html.
  • 7Kang U, Tsourakakis C E, Faloutsos C. PEGASUS: mining peta-scale graphs[J]. Knowledge and Information Systems, 20 II, 27(2): 303-325.
  • 8Seo S, Yoon E J, Kim J, et al. HAMA: an efficient matrix computation with the MapReduce framework[C]//Proceedings of the IEEE 2nd International Conference on Cloud Com?puting Technology and Science (CloudCom ' 10). Wasbing?ton, DC, USA: IEEE Computer Society, 2010: 721-726.
  • 9Saad Y. Iterative methods for sparse linear systems[M). Phila?delphia, PA, USA: Society for Industrial and Applied Mathe?matics, 2003.
  • 10Park S C, Draayer J P, Zbeng S Q. Fast sparse matrix multi?plication[J]. Computer Physics Communications, 1992, 70(3): 557-568.

同被引文献35

  • 1周军锋,汤显,郭景峰.一种优化的协同过滤推荐算法[J].计算机研究与发展,2004,41(10):1842-1847. 被引量:103
  • 2李国杰.大数据研究的科学价值[J].中国计算机学会通讯,2012,8(9):8-15.
  • 3新浪微博用户超5亿[EB/OL].[2013-02-21].http://news.xinhuanet.com/info/2013-02/21/c__132181760.htm.content8736757.htm.
  • 4Jeh G, Widom J. SimRank: a measure of structural-context similarity[C]//Proceedings of the 8th ACM SIGKDD Inter- national Conference on Knowledge Discovery and Data Mining (KDD '02), Strathcona, USA, 2002. New York, NY, USA: ACM, 2002: 538-543.
  • 5Page L, Brin S, Motwani R, el al. The pagerank citation ranking: bringing order to the Web[R/OL]. Stanford Univer- sity Database Group (1998)[2014-03-10]. http://citeseer, nj. nec.com/368196.html.
  • 6Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters[J]. Communications of ACM, 2004, 51(1): 107-113.
  • 7Valiant L G. A bridging model for parallel computation[J]. Communications of the ACM, 1990, 33(8): 103-111.
  • 8Malewicz G, Austern M H, Bik A J, et al. Pregel: a system for large-scale graph processing[C]//Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD '10), Ashraf Aboulnaga, 2010. New York, NY, USA: ACM, 2010: 135-146.
  • 9Yu Weiren, Lin Xuemin, Zhang Wenjie. Towards efficient SimRank computation on large graphs[C]//Proceedings of the 29th International Conference on Data Engineering (ICDE ' 13), Brisbane, Australia, 2013. Piscataway, N J, USA: IEEE, 2013: 601-612.
  • 10Li Cuiping, Han Jiawei, He Guoming, et al. Fast computa- tion of SimRank for static and dynamic information net- works[C]//Proceedings of the 13th International Confer- ence on Extending Database Technology (EDBT '10), Laus- anne, Switzerland, 2010. New York, NY, USA: ACM, 2010: 465-476.

引证文献3

二级引证文献30

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部