摘要
为解决学者与成果的精确匹配问题,该文提出了一种基于图卷积半监督学习的论文作者同名消歧方法。该方法使用SciBERT预训练语言模型计算论文题目、关键字获得论文节点语义表示向量,利用论文的作者和机构信息获得论文的合作网络和机构关联网络邻接矩阵,并从论文合作网络中采集伪标签获得正样本集和负样本集,将这些作为输入利用图卷积神经网络进行半监督学习,获得论文节点嵌入表示进行论文节点向量聚类,实现对论文作者同名消歧。实验结果表明,与其他消歧方法相比,该方法在实验数据集上取得了更好的效果。
In order to solve the problem of exact matching between scholars and articles,a new method of author name disambiguation is proposed based on semi-supervised learning with graph convolutional network.In this method,the SciBERT pre-training language model is applied to calculating the semantic embedding vector of each paper with their title and keywords.Authors and organizations of papers are used to obtain the adjacency matrixes of the paper’s co-author network and co-organization network.The pseudo labels are collected from the co-author network to obtain the positive and negative samples.The semantic embedding vector,adjacency matrixes and the positive and negative samples are used as input to be processed by Graph Convolution neural Network(GCN).In semi-supervised learning,the embedding vectors of papers are learned to be clustered in order to realize the name disambiguation of papers.The experimental results show that,compared with other disambiguation methods,this method achieves better results on the experimental dataset.
作者
盛晓光
王颖
钱力
王颖
SHENG Xiaoguang;WANG Ying;QIAN Li;WANG Ying(School of Artificial Intelligence,University of Chinese Academy of Sciences,Beijing 100049,Ch1ina;National Science Library,Chinese Academy of Sciences,Beijing 100190,China;Department of Library,Information and Archives Management,University of Chinese Academy of Sciences,Beijing 100190,China)
出处
《电子与信息学报》
EI
CSCD
北大核心
2021年第12期3442-3450,共9页
Journal of Electronics & Information Technology
基金
国家自然科学基金(61702038)
国家社会科学基金(15CTQ006)。
关键词
同名消歧
图卷积神经网络
BERT语言模型
Name disambiguation
Graph Convolutional Network(GCN)
BERT language model