期刊文献+

基于Spark的并行图数据分析系统 被引量:13

Parallel Graph Data Analysis System Based on Spark
下载PDF
导出
摘要 提出了一种基于Spark云计算平台的并行数据分析系统。该系统以大规模图数据分析任务为主,并且支持非图数据分析的应用,集成数据分析算法集与非图数据分析算法集。详细阐述了该系统的架构设计,工作流引擎和动态组件更新技术以及部分并行数据分析算法的设计与实现。通过对多种规模的数据集进行性能测试,以及与传统的Map Reduce平台进行性能对比,证明了该系统相对于以往的图数据挖掘系统可以更高效地完成计算任务,而且也可以有效进行非图数据分析。 This paper proposes a parallel data analysis system based on the cloud computing platform of Spark. This system mainly aims at large-scale graph data analysis tasks, supports analysis applications of non-graph data, and integrates the sets of data analysis algorithms and non-graph data analysis algorithms. Then, this paper describes the design and implementation of the system, as well as workflow engine and dynamic component update technology,part of the parallel data analysis algorithms. Through tests of multiple scales of datasets and performance comparison with traditional Map Reduce platform, this paper proves that the system is more efficient at completing computing tasks compared with the previous graph data mining system, and can analyze efficiently non-graph data.
出处 《计算机科学与探索》 CSCD 北大核心 2015年第9期1066-1074,共9页 Journal of Frontiers of Computer Science and Technology
基金 教育部-中国移动科研基金No.MCM20130351 北京市教育委员会共建项目~~
关键词 云计算 并行算法 图数据分析 数据挖掘 社会网络分析 cloud computing parallel algorithms graph data analysis data mining social network analysis
  • 相关文献

参考文献17

  • 1Dean J, Ghemawat S. MapReduce: simplified data processing on large c1usters[J]. Communications of the ACM, 2008, 51 (1): 107-113.
  • 2Gerbessiotis A V, Valiant L G. Direct bulk-synchronous parallel algorithms[J]. Journal of Parallel and Distributed Computing, 1994,22(2): 251-267.
  • 3Low Y, Gonzalez J, Kyrola A, et al. Graphlab: a new framework for parallel machine learning[J/OL]. arXiv:1006.4990 (2010)[2014-10-16]. http://arxiv.org/abs/1408.2041.
  • 4Malewicz G, Austern M H, Bik A J C, et al. Pregel: a sys- tern for large-scale graph processing[C]//Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, Indianapolis, USA, Jun 6-11, 2010. New York, NY, USA: ACM, 2010: 135-146.
  • 5Avery C. Giraph: large-scale graph processing infrastruction on Hadoop[C]//Proceedings of Hadoop Summit, Santa Clara, USA, 2011.
  • 6Seo S, Yoon E J, Kim J, et al. Hama: an efficient matrix computation with the MapReduce framework[C]//Proceedings of the 2nd IEEE International Conference on Cloud Computing Technology and Science, Indianapolis, USA, Nov 30-Dec 3,2010. Piscataway, NJ, USA: IEEE, 2010: 721-726.
  • 7Zaharia M, Chowdhury M, Franklin M J, et al. Spark: cluster computing with working sets[C]//Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, Boston, USA, Jun 22-25, 2010. Berkeley, CA, USA: USENIX, 2010.
  • 8Xin R S, Gonzalez J E, Franklin M J, et al. GraphX: a resilient distributed graph system on Spark[C]//Proceedings of the 1st International Workshop on Graph Data Management Experiences and Systems, Jun 23,2013. New York, NY, USA: ACM,2013.
  • 9Deelman E, Singh G, Su Meihui, et al. Pegasus: a framework for mapping complex scientific workflows onto distributed systems[J]. Scientific Programming, 2005, l3(3): 219-237.
  • 10Isard M, Budiu M, Yu Yuan, et al. Dryad: distributed data- parallel programs from sequential building blocks[J]. ACM SIGOPS Operating Systems Review, 2007, 41(3): 59-72.

同被引文献109

  • 1赵卓翔,王轶彤,田家堂,周泽学.社会网络中基于标签传播的社区发现新算法[J].计算机研究与发展,2011,48(S3):8-15. 被引量:37
  • 2姚全珠,张杰.基于数据挖掘的搜索引擎技术[J].计算机应用研究,2006,23(11):29-30. 被引量:7
  • 3钱功伟,倪林,MIAO Yuan,曹荣.基于网页链接和内容分析的改进PageRank算法[J].计算机工程与应用,2007,43(21):160-164. 被引量:25
  • 4IDC. The Digital Universe of Opportunities:Rich Data and the Incdreasing Value of the Internet of Things [EB/OL]. [2014-04]. http://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm.
  • 5FERRERIA C R L , Traina J C, MACHADO T A J, et al. Clustering Very Large Multi-Dimensional Datasets with Mapreduce [C]. 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011 ACM. San Diego: ACM Press, 2011: 690-698.
  • 6YU Y, HUANG C, LEE Y. An Intelligent Touring System Based on Mobile Social Network and Cloud Computing for Travel Recom- mendation[C]. 28th International Conference on Advanced Information Networking and Applications Workshops(AINA), 2014 IEEE. Victoria, Canada: IEEE Press, 2014:19-24.
  • 7WALUNJ S G, SADAFALE K. An Online Recommendation System for E-commerce Based on Apache Mahout Framework[C]. 2013 Annual Conference on Computers and People Research, 2013 ACM. Cincinnati: ACM Press,2013: 153-158.
  • 8ZAHARIA M, CHOWDHURY M, FRANKLIN M J, et al. Spark: Cluster Computing with Working Sets[C]. Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing , 2010:10-10.
  • 9ZAHARIA M, CHOWDHURY M, DAS T, et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for in-Memory Cluster Computing[C]. Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. USENIX Association, 2012:2-2.
  • 10X.LU,M.W.U. RAHMAN, N. ISLAM, D. SHANKAR. Accelerating Spark with RDMA for Big Data Processing: Early Experiences[C]. Proceedings of the 22nd Annual Symposium on High-Performance Interconnects.2010:9-16.

引证文献13

二级引证文献66

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部