摘要
节点标签是复杂网络中广泛存在的监督信息,对网络表示学习具有重要作用。基于此,提出了一种结合图自编码器与聚类的半监督表示学习方法(GAECSRL)。首先,以图卷积网络(GCN)和内积函数分别作为编码器和解码器,并构建图自编码器以形成信息传播框架;然后,在编码器生成的低维表示基础上增加k-means聚类模块,从而使图自编码器的训练过程和节点的类别分布划分形成自监督机制;最后,利用节点标签的判别信息对网络低维表示的类别划分进行指导,将网络表示生成、类别划分以及图自编码器的训练构建在一个统一的优化模型中,并获得融合节点标签信息的有效网络表示结果。在仿真实验中,将GAECSRL用于节点分类和链接预测任务。实验结果表明,相比DeepWalk、node2vec、全局结构信息图表示学习(GraRep)、结构化深度网络嵌入(SDNE)和用数据的转导式或归纳式嵌入预测标签和邻居(Planetoid),在节点分类任务中GAECSRL的Micro-F1指标提高了0.9~24.46个百分点,Macro-F1指标提高了0.76~24.20个百分点;在链接预测任务中,GAECSRL的AUC指标提高了0.33~9.06个百分点,说明GAECSRL获得的网络表示结果能有效提高节点分类和链接预测任务的性能。
Node label is widely existed supervision information in complex networks,and it plays an important role in network representation learning. Based on this fact,a Semi-supervised Representation Learning method combining Graph AutoEncoder and Clustering(GAECSRL)was proposed. Firstly,the Graph Convolutional Network(GCN)and inner product function were used as the encoder and the decoder respectively,and the graph auto-encoder was constructed to form an information dissemination framework. Then,the k-means clustering module was added to the low-dimensional representation generated by the encoder,so that the training process of the graph auto-encoder and the category classification of the nodes were used to form a self-supervised mechanism. Finally,the category classification of the low-dimensional representation of the network was guided by using the discriminant information of the node labels. The network representation generation,category classification,and the training of the graph auto-encoder were built into a unified optimization model,and an effective network representation result that integrates node label information was obtained. In the simulation experiment,the GAECSRL method was used for node classification and link prediction tasks. Experimental results show that compared with DeepWalk,node2vec,learning Graph Representations with global structural information(GraRep),Structural Deep Network Embedding(SDNE)and Planetoid(Predicting labels and neighbors with embeddings transductively or inductively from data),GAECSRL has the Micro-F1 index increased by 0. 9 to 24. 46 percentage points,and the Macro-F1 index increased by 0. 76 to 24. 20 percentage points in the node classification task;in the link prediction task,GAECSRL has the AUC(Area under Curve)index increased by 0. 33 to 9. 06 percentage points,indicating that the network representation results obtained by GAECSRL effectively improve the performance of node classification and link prediction tasks.
作者
杜航原
郝思聪
王文剑
DU Hangyuan;HAO Sicong;WANG Wenjian(School of Computer and Information Technology,Shanxi University,Taiyuan Shanxi 030006,China;Key Laboratory Computational Intelligence and Chinese Information Processing of Ministry of Education(Shanxi University),Taiyuan Shanxi 030006,China)
出处
《计算机应用》
CSCD
北大核心
2022年第9期2643-2651,共9页
journal of Computer Applications
基金
国家自然科学基金资助项目(61902227,61773247)
山西省高等学校科技创新项目(2019L0039)
山西省自然科学基金资助项目(201901D211192)。
关键词
网络表示学习
网络嵌入
节点标签
图神经网络
自监督机制
network representation learning
network embedding
node label
graph neural network
self-supervised mechanism