摘要
基于非对比学习(NCL)的异构图嵌入模型不依赖负样本学习数据的内在特征和模式,可能导致模型无法有效地学习节点之间的区分度。提出了一种基于跨视图原型非对比学习的异构图嵌入模型(XP-NCL),通过寻找额外的正样本提供更多关于源节点的上下文信息,并重新考虑了正样本之间的相似性,从而为下游任务学习更高效的节点表征。该模型首先设计了一种基于异构图随机游走的树型结构,通过筛选出满足局部结构约束的随机游走路径,从而构建正样本的有向筛选树(DFT),该树包含丰富的邻居信息和语义信息;其次针对异构图的特性,定义了跨视图原型指数(ISDR)和峰值算子(peak operator),从多个维度考虑了同类样本在数量和数值上的对齐;在此基础上,模型利用停止梯度更新进行训练。最后,在ACM、DBLP和freebase数据集上,实验验证了节点的分类和聚类性能,结果表明,即使不使用负样本,XP-NCL表征与其他同构图和异构图基线相比,很多情况下都可以呈现出更优越的性能。
Heterogeneous graph embedding models based on non-contrastive learning(NCL)do not rely on negative sampling to learn the intrinsic features and patterns,which may cause the model fail to efficiently learn the differences between vertexes.This paper proposed a heterogeneous graph embedding model based on cross-view prototype non-contrastive learning(XP-NCL),which learnt better node representations for downstream tasks by finding additional positive samples with more contextual information,and reconsidered the similarity between positive samples.The model firstly designed a tree structure based on random walks in heterogeneous graph.This directed filtering tree(DFT)about positive samples contained rich neighboring and semantic information by filtering out random walk paths that satisfied local structural constraints.Secondly,to achieve the alignment of similar samples in terms of numerical and quantitative from multiple dimensions,XP-NCL defined the cross-view prototype index(ISDR)and peak operator based on the characteristics of heterogeneous graphs.Furthermore,the model trained using stop-gradient updating.Finally,experiments verify the classification and clustering performance of the node on ACM,DBLP and freebase datasets,and the results show that even without the negative samples,the XP-NCL representation can achieve superior performance in many cases compared to other homogeneous and heterogeneous graph baselines.
作者
张敏
杨雨晴
贺艳婷
史晨辉
Zhang Min;Yang Yuqing;He Yanting;Shi Chenhui(School of Computer Science&Technology,Taiyuan University of Science&Technology,Taiyuan 030024,China)
出处
《计算机应用研究》
CSCD
北大核心
2024年第9期2611-2619,共9页
Application Research of Computers
基金
国家自然科学基金资助项目(U1931209)
山西省科技合作交流专项区域合作项目(202204041101037,202204041101033)
太原科技大学研究生教育创新项目(BY2023015)。
关键词
异构图嵌入
非对比学习
有向筛选树正样本采样
交对称差比
峰值算子
heterogeneous graph embedding
non-contrastive learning
directed filtering tree positive sampling
intersection to symmetric difference ratio
peak operator