摘要
图神经网络已成为当前图结构数据表示学习最常用的方法,在各层级的图结构数据表征、应用及分析任务上都取得了显著的效果。图神经网络学习到的嵌入融合了结构特征和节点语义。按照不同粒度的嵌入对象划分,图神经网络方法被分为节点级、子图级和整图级表示学习方法,也对应不同的下游应用任务。尽管如此,现有的图神经网络模型在嵌入维度设定上,仍依赖工程化的人工经验探索方法,缺乏有理论性依据指导的可计算方法,导致图神经网络表示学习模型在下游应用中往往效果欠佳。本文基于结构熵极小化原理,提出了一种全新的可解释的图神经网络模型嵌入维度估计的理论和方法框架。对于节点级嵌入的图神经网络模型,该框架同时考虑结构熵和节点属性熵,为所有节点的嵌入向量给出一套统一的最优维度估计。对于整图级嵌入或子图级嵌入的图神经网络模型,该框架除了考虑上述两类熵还考虑了图样本间的差异性,为不同复杂度的图样本提供个性化的最优嵌入维度估计。在18个图结构数据集上开展了丰富的下游应用实验,验证了所提框架在图学习分类应用中均有效和稳定地提升了精度,充分证实了所提图神经网络嵌入维度估计的理论和方法的正确性。
Graph neural networks(GNNs)have achieved significant results in the representation,application and analysis of graph-structured data at all levels,and GNNs have become the most popular method in graph-structured data representation learning.The embedding learned by GNNs integrates both graph structure feature and node semantics.By embedding at different granularities,GNN methods are divided into node-level,subgraph-level and whole graph-level representation learning methods,corresponding to different downstream application tasks.However,existing GNN models still rely on artificial experience exploration methods in the decision of embedding dimension,which lacks computation methods with theoretical basis,resulting in sub-optimal performances on GNN representation learning model in downstream applications.Based on the structural entropy minimization principle,this paper proposes a novel theoretical and interpretable framework for embedding dimension estimation in GNN models.For node-level GNN models,this framework provides a unified optimal embedding dimension estimation for all nodes by considering both structure entropy and node attribute entropy.For GNN models with whole graph-level and subgraph-level,the framework also considers the difference between graph samples besides the above two types of entropy,providing a customized optimal embedding dimension estimation for graph samples with different complexities.Extensive downstream experiments are conducted on 18 graph-structured datasets,it is verified that the proposed framework effectively and stably improves the accuracy of graph learning classification tasks,which fully proves the correctness of theory and method for the proposed GNN embedding dimension estimation.
作者
彭浩
苏丁力
李昂生
苏剑林
孙硕
PENG Hao;SU Dingli;LI Angsheng;SU Jianlin;SUN Shuo(School of Cyber Science and Technology,Beihang University,Beijing 100191,China;School of Computer Science and Engineering,Beihang University,Beijing 100191,China;State Key Laboratory of Software Development Environment,Beihang University,Beijing 100191,China;Shenzhen Zhuiyi Technology Co.,Ltd.,Shenzhen 518054,China)
出处
《网络空间安全科学学报》
2023年第3期107-125,共19页
Journal of Cybersecurity
基金
北京市自然科学基金(4222030)
国家自然科学基金项目(62322202,61932002)。
关键词
维度估计
结构熵
图神经网络
熵
可解释性
dimension estimation
structure entropy
GNN
entropy
interpretability