期刊文献+

大图数据上顶点驱动的并行最小生成树算法 被引量:7

Vertex-Driven Parallel Minimum Spanning Tree Algorithms on Large Graphs
下载PDF
导出
摘要 最小生成树(minimum spanning tree,MST)是图论中最为经典算法之一.基于MST结构的聚类、分类和最短路径查询等复杂图算法,在效率和结果质量方面均有显著提高.然而,随着互联网的迅猛发展,图数据规模也变得越来越大,包含千万甚至上亿个顶点的大图数据越发常见.因此,如何在大图数据上实现查询处理和数据挖掘算法已成为亟待解决的问题之一.除此之外,由于大图数据的动态性特征,如何动态地维护算法结果也势必成为最受关注的问题之一.针对目前集中式的最小生成树算法无法解决海量和动态图数据的问题,首先提出了分区Prim(partition Prim,PP)算法,基于此提出了顶点驱动的并行MST算法——PB(PP Boru。vka)算法,并论证了PB算法的正确性.另外,基于MapReduce和BSP框架实现了PB算法.针对只删除动态图特征,提出了MST维护算法,以实现高效的增量计算.对提出的计算和维护算法进行了代价分析和比较.最后,使用真实和模拟数据集,验证了PB算法和维护算法的有效性、高效性和可扩展性. The minimum spanning tree(MST) algorithm is one of the most classic algorithms in the graph theory. Some complex graph algorithms based on MST including clustering, classification and shortest path queries have been improved significantly in terms of efficiency and quality. However, with the rapid development of Internet, the scales of graphs have been becoming larger and larger. Large scale graphs which contain millions or even billions of vertices have become more common. Therefore, how to implement query processing and data mining algorithms on large scale graphs has become a problem to be solved urgently. In addition, because of the dynamic properties of large-scale graphs, how to maintain results dynamically has also become one of the most attractive problems. However, the state of the art MST algorithms can't handle such massive and dynamic graph data. In this paper, we propose a vertex-driven parallel MST algorithm called PB based on a partition Prim algorithm named PP, and demonstrate the correctness of PB. Moreover, we implement the whole process of PB algorithm on the MapReduce and BSP framework respectively. Taking deletion-only graphs into consideration, we put forward a maintenance algorithm for MST which can conduct efficient incremental evaluation. All the computation and maintenance costs are further analyzed and compared. Finally, experiments based on real and synthetic data sets demonstrate the reliabilty, efficiency and scalability of our algorithms.
出处 《计算机研究与发展》 EI CSCD 北大核心 2014年第12期2688-2701,共14页 Journal of Computer Research and Development
基金 国家"九七三"重点基础研究发展计划基金项目(2012CB316201) 国家自然科学基金项目(61472071 61272179 61033007 61173028) 中央高校基本科研业务费专项资金项目(N130404010 N110404006)
关键词 大图数据 顶点驱动 最小生成树 并行算法 维护算法 large graphs vertex-driven minimum spanning tree (MST) parallel algorithm maintenance algorithm
  • 相关文献

参考文献1

二级参考文献167

  • 1Nature. Big Data [EB/OL]. [2012-10-02]. http,//www. nature, com/news/specials/bigdata/index, html.
  • 2Bryant R E, Katz R H, Lazowska E D. Big-Data computing : Creating revolutionary breakthroughs in commerce, science, and society [R]. [2012-10-02]. http:// www. cra. org/ccc/docs/init/Big_Data, pdf.
  • 3Science. Special online collection: Dealing with data [EB/OL]. [2012-10-02]. http://www, sciencemag, org/site/ special/data/, 2011.
  • 4Agrawal D, Bernstein P, Bertino E, et al. Challenges and opportunities with big data A community white paper developed by leading researchers across the United States [R/OL]. [2012-10-02]. http://cra, org/ccc/docs/init/bigdata whitepaper, pdf.
  • 5Manyika J, Chui M, Brown B, et al. Big data: The next frontier for innovation, competition, and productivity [R/OL]. [ 2012-10-02 ]. http://www, mekinsey, corn/ Insights]MGI[Research/Teehnology _ and _ Innovation]Big _ data The next frontier for innovation.
  • 6World Economic Forum. Big data, big impact: New possibilities for international development [R/OL]. [2012- 10-02]. http://www3, weforum, org/docs/WEF TC MFS BigDataBigImpact_Briefing 2012. pdf.
  • 7Big Data Across the Federal Government [EB/OL]. [2012-10-02]. http://www, whitehouse, gov/sites/default/ files/microsites/ostp/big_data fact sheet_final_ 1. pdf.
  • 8UN Global Pulse. Big Data for Development:Challenges Opportunities [R/OL]. [ 2012-10-02 ]. http://www. unglobalpulse, org/proj ects/BigDataforDevelopment.
  • 9Times N Y. The age of big data fEB/OLd. [2012-10 -02]. http://www, nytimes, com/2012/02/12/sunday review/big- datas-impact in-the-world, html?pagewanted=all.
  • 10Grobelnik M. Big-data computing: Creating revolutionary breakthroughs in commerce, science, and society [R/OL]. [2012-10 -02]. http://videolectures, net/cswc2012_grobelnik_ big_data/.

共引文献2376

同被引文献35

引证文献7

二级引证文献26

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部