摘要
在基于神经网络的图表示算法中,当节点属性维度过高、图的规模过大时,从内存到显存的数据传输会成为训练性能的瓶颈。针对这类问题,该方法将图划分算法应用于图表示学习中,降低了内存访问的I/O开销。该方法根据图节点的度数,将图划分成若干个块,使用显存缓存池存储若干个特征矩阵块。每一轮训练,使用缓存池中的特征矩阵块,以此来减少内存到显存的数据拷贝。针对这一思想,该方法使用基于图划分的抽样算法,设计显存的缓存池来降低内存的访问,运用多级负采样算法,降低训练中负样本采样的时间复杂度。在多个数据集上,与现有方法对比发现,该方法的下游机器学习准确率与原算法基本一致,训练效率可以提高2~7倍。实验结果表明,基于图划分的图表示学习能高效训练模型,同时保证节点表示向量的测试效果。今后的课题可以使用严谨的理论证明,阐明图划分模型与原模型的理论误差。
When training graph embedding via neural network,high dimension feature vector and large scale graph cause data transferring from memory to GPU to be a bottleneck. Aimed to solve this problem,this paper proposed graph partition based graph representation learning. This method splitted graph nodes into blocks according to their degree. It stored several node feature matrices in buffer pool on GPU. Every epoch,it trained representation during several blocks which fitted into buffer pool to reduce the data transferred from memory to GPU. Based on block split,this method used blocked based sampling algorithm,cached block feature matrix in GPU buffer pool to reduce memory read and built hierarchical negative sampling table,which could sample nodes in const time complex. Compared to related work on several real world datasets,this method achieves competitive accuracy at downstream machine learning task and 2 ~ 7 times speedup on training process. The experiments show that graph representation learning based on partition can train model efficiently and generate accurate embedding vectors. In future work,it is worth to prove the deviation between partition based method and original method in theory.
作者
夏鑫
高品
陈康
姜进磊
Xia Xin;Gao Pin;Chen Kang;Jiang Jinlei(Dept.of Computer Science&Technology,Tsinghua University,Beijing 100084,China;WeChat Group,Tencent Corporation,Shenzhen Guangdong 518057,China)
出处
《计算机应用研究》
CSCD
北大核心
2020年第9期2586-2590,2599,共6页
Application Research of Computers
基金
国家重点研发计划资助项目(2018YFB1003505)。
关键词
图划分
图表示学习
图抽样
图神经网络
graph partition
graph representation learning
graph sampling
graph neural network