摘要
矩阵乘法是线性代数和图算法中非常重要的一个基本操作,而大规模数据处理中的矩阵往往是稀疏矩阵。MapReduce编程框架能够有效地支持海量数据的分布式计算。因此,对如何运用MapReduce编程框架实现超大规模稀疏矩阵的乘法进行了研究。传统矩阵乘法并行算法没有针对稀疏矩阵进行专门优化,导致计算过程中出现大量不必要的通信开销。提出了一种新的算法——CRM(column row multiplication)算法,并与传统的矩阵分块算法进行了比较。实验证明,CRM算法运行效率有很大的提高,并且具有高度的可伸缩性,适合在MapReduce平台上运行。
Matrix multiplication is an important fundamental operation in algebra and graph algorithms. And matrixes are usually highly sparse when coming to massive data processing. MapReduce is a programming model which can process large data sets effectively. This paper focuses on how to deal with massive sparse matrix multiplication on top of MapReduce programming model. Block based matrix multiplication algorithms aren' t optimized for sparse matrix and produce large amount of redundant communication. This paper proposes a new algorithm named CRM (column row multiplication), and compares it with traditional block based matrix algorithms. The experimental results demonstrate that CRM has higher efficiency and scalability, is suitable for operating on MapReduce and out- performs traditional ways considerably.
出处
《计算机科学与探索》
CSCD
2013年第11期973-982,共10页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金~~