摘要
为解决大多数知识图谱表示学习模型仅使用三元组信息的问题,提出融合语义解析的知识图谱表示模型BERT-PKE.模型利用实体和关系的文本描述,通过BERT的双向编码表示进行语义解析,深度挖掘语义信息.由于BERT训练代价昂贵,提出一种基于词频和k近邻的剪枝策略,提炼选择文本描述集.此外,由于负样本的构造影响了模型的训练,提出2种改进随机抽样的策略:一种是基于实体分布的负采样方法,以伯努利分布概率来选择替换的实体,该方法可以减少负采样引起的伪标记问题;另一种是基于实体相似性负采样方法,首先用TransE将实体嵌入到向量空间,使用k-means聚类算法将实体进行分类.通过同簇实体的相互替换可获得高质量的负三元组,有利于实体的特征学习.实验结果表明,所提出BERT-PKE模型与TransE,KG-BERT,RotatE等相比,性能有显著提升.
To solve the problem that the knowledge graph representation learning model only uses triples information,a representation model with semantic analysis is proposed,which is named bidirectional encoder representations from transformers-pruning knowledge embedding(BERT-PKE).It employs bidirectional encoder representations to analyze text,and mines the depth semantic information of entities and relations based on the entities and relations of text description.Since BERT has the heavy consumption in the training time,we propose a pruning strategy with word frequency and k-nearest neighbors to extract the selected text description set.In addition,due to the construction of negative samples has impacts on training model,two strategies are introduced for improving random sampling.One is a negative sampling method based on entity distribution,in which the Bernoulli distribution probability is used to select the replaced entities.It reduces the Pseudo-Labelling problem caused by negative sampling.The other is a negative sampling method based on the similarity of the entities.It mainly uses TransE and k-means to represent the entities as the vectors and classify the entities respectively.High-quality negative triples can be obtained by mutual replacement of entities in the same cluster,which is helpful for feature learning of entities.Experimental results show that the performance of proposed model is significantly improved compared to the SOTA baselines.
作者
胡旭阳
王治政
孙媛媛
徐博
林鸿飞
Hu Xuyang;Wang Zhizheng;Sun Yuanyuan;Xu Bo;Lin Hongfei(School of Computer Science and Technology,Dalian University of Technology,Dalian,Liaoning 116024)
出处
《计算机研究与发展》
EI
CSCD
北大核心
2022年第12期2878-2888,共11页
Journal of Computer Research and Development
基金
国家重点研发计划项目(2018YFC0830603)。
关键词
知识图谱表示学习
BERT模型
语义解析
负采样
剪枝
knowledge graph representation learning
BERT
semantic analysis
negative sampling
pruning