期刊文献+

大规模图神经网络研究综述 被引量:2

A Survey of Large-Scale Graph Neural Networks
下载PDF
导出
摘要 图神经网络凭借其处理非欧氏空间数据及其复杂特征方面的优越性受到了大量的关注,并且被广泛应用于推荐系统、知识图谱、交通道路分析等场景中.面对大规模数据,图结构的不规则性、节点特征的复杂性以及训练样本之间的依赖性对图神经网络模型的计算效率、内存管理以及分布式系统中的通信开销造成了巨大的压力.为应对和缓解以上问题,研究者从应用场景、算法模型、编程框架和硬件结构等多个层面对其进行了优化.本文主要回顾和总结了算法模型及编程框架方面的优化,为读者了解面向大规模数据的图神经网络采样算法以及框架优化相关工作提供帮助,为未来算法-框架协同优化奠定基础.具体来说,本文首先简要介绍图神经网络模型中的消息传递机制,分类介绍常见的图神经网络模型,并分析其在大规模数据训练中面临的困难和挑战;然后对面向大规模数据的图神经网络算法模型进行分类总结和分析,包括基于节点、边和子图的采样算法;接着介绍图神经网络编程框架加速的相关进展,主要包括主流框架的介绍以及优化技术的分类总结和分析;最后对未来面向大规模数据的图神经网络研究进行展望. Graph Neural Networks(GNNs)have garnered increasing attention for their ability to model non-Euclidean graph structures and complex features.They have been applied extensively in various application domains,such as recommender systems,link prediction,and traffic prediction.However,training GNN models on large-scale data poses several challenges,such as irregular graph structures,complex node features,and dependent graph training samples.These challenges can put a strain on computation efficiency,memory management,and the communication cost of distributed computing.To overcome these challenges,many researchers have focused on optimi-zing application methods,algorithm models,programming frameworks,and hardware design.This survey specifically focuses on algorithm optimization and framework acceleration for large-scale GNN models.By examining related works in these areas,this survey aims to help readers understand the existing research as well as lay the foundation for co-optimizing GNN algorithms and frameworks for large-scale data.This survey is structured as follows.Firstly,we provide an overview of the challenges faced by GNNs in large-scale applications and the major optimization methods used to deal with these challenges.In addition,we compare our survey with existing surveys on GNNs.The major difference is that our survey focuses specifically on GNN models in large-scale applications.We summarize and analyze related works on GNN algorithms and framework optimization with a focus on scalability.In the second section,we provide a brief overview of the message passing mechanism and classify GNN models into four categories:Graph Convolutional Networks,Graph Attention Networks,Graph Recurrent Neural Networks,and Graph Autoencoder.For each category,we introduce the major network design,including propagation and aggregation strategies,and analyze the corresponding challenges of processing large-scale data.Furthermore,we provide a summary of the challenges faced by GNN models in large-scale applications,in terms of full-batch and mini-batch training modes.Thirdly,we classify and analyze GNN algorithms for large-scale data.We focus on sampling-based GNNs at different granularities,which use node-,layer-,and subgraph-based sampling strategies to optimize the mini-batch training of GNNs.Specifically,node-based sampling strategies usually select a fixed number of neighbors for each node,layer-based sampling methods operate at each GNN layer,and subgraph-based sampling approaches attempt to find dense subgraphs as mini batches.We provide a summary of each type of sampling strategy,including its key ideas,related works,and a discussion of its advantages and disadvantages.In the fourth section of this survey,we introduce mainstream programming frameworks for GNN models and related optimization techniques for framework acceleration.We briefly introduce mainstream programming frameworks one by one,such as DGL,PyG,Graph-Learn,and also summarize their characteristics.We divide these optimization strategies into five categories:data partition,task scheduling,parallel execution,memory management,and other methods.Finally,we summarize this survey.We also provide prospects for future work in optimizing GNN models and accelerating frameworks for large-scale data,such as reducing redundant computation,algorithm and framework co-optimization,graph-aware optimizations,support for complex graphs,flexible scheduling based on hardware features,optimizations on distributed platforms,framework and hardware co-optimization and minimizing node representation dimensions.
作者 肖国庆 李雪琪 陈玥丹 唐卓 姜文君 李肯立 XIAO Guo-Qing;LI Xue-Qi;CHEN Yue-Dan;TANG Zhuo;JIANG Wen-Jun;LI Ken-Li(College of Computer Science and Electronic Engineering,Hunan University,Changsha 410082;Shenzhen Research Institute,Hunan University,Shenzhen,Guangdong 518000)
出处 《计算机学报》 EI CAS CSCD 北大核心 2024年第1期148-171,共24页 Chinese Journal of Computers
基金 广东省重点领域研发计划(2021B0101190004) 国家自然科学基金(62172157,62202149) 湖南省科技项目(2023GK2002、2021RC3062) 广东省自然科学基金(2023A1515012915) 深圳市基础研究面上项目(JCYJ20210324135409026) 之江实验室开放课题(2022RC0AB03)资助。
关键词 图神经网络 大规模数据 算法优化 框架加速 graph neural network large-scale data algorithm optimization framework acceleration
  • 相关文献

参考文献6

二级参考文献7

共引文献301

同被引文献20

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部