摘要
针对现有的融合文本和路径信息的模型未能充分挖掘和利用文本与路径语义的问题,提出了新的知识图谱嵌入学习模型(GETR模型):首先,利用LDA丰富实体描述文本语义并用TWE获取词和主题向量,采用Bi-LSTM模型把词和主题向量编码融入实体向量表示中,以增强结点的语义表达能力;其次,设计了以组合PageRank和余弦相似度算法为策略的随机游走算法,以获取实体间的多步路径,并利用自注意力机制捕获路径的重要语义融入到翻译模型中进行联合训练,从而达到有效过滤路径中的噪声和提高模型效率的目的.最后,在数据集FB15K、FB20K和WN18上,对GETR、TransE、DKRL、TKGE模型进行知识补全和实体分类任务的评测,结果表明:GETR模型具有更好的性能表现,是一种更加高效的知识表示方法.
Considering that the existing models cannot completely take advantage of the semantic information of texts and paths,a new model of knowledge graph embedding(named GETR model)is proposed.First,LDA is used to enrich the semantics of an entity description text and TWE is used to obtain word embedding and topic embedding.To enhance the representation of entities,the modified Bi-LSTM model is exploited to encode word embedding and topic embedding.Furthermore,the multiple-step path between two entities is obtained through random walks with the strategy of combining PageRank and Cosine similarity.Additionally,to filter the noise and improve the efficiency of the model,the important semantics of the multi-step path to be used for joint training with the translation model is captured with the self-attention mechanism.Finally,the proposed model GETR,as well as the baseline models TransE,DKRL and TKGE,is evaluated in the tasks of knowledge graph completion and entity classification with three datasets:FB15K,FB20K,and WN18.Experimental results demonstrate that the proposed model outperforms the baseline models,indicating that the new model is more effective for knowledge representation.
作者
肖宝
韦丽娜
李璞
蒋运承
XIAO Bao;WEI Lina;LI Pu;JIANG Yuncheng(School of Electronics and Information Engineering,Beibu Gulf University,Qinzhou 535011,China;School of Computer Science,South China Normal University,Guangzhou 510631,China;Software Engineering College,Zhengzhou University of Light Industry,Zhengzhou 450000,China;School of Information Science and Engineering,Guangxi University for Nationalities,Nanning 530006,China)
出处
《华南师范大学学报(自然科学版)》
CAS
北大核心
2020年第6期103-112,共10页
Journal of South China Normal University(Natural Science Edition)
基金
国家自然科学基金项目(61802352)
广西壮族自治区高校中青年教师科研基础能力提升项目(2019KY0463)
钦州市科学研究与技术开发计划项目(20189903)。
关键词
知识图谱嵌入
随机游走
自注意力机制
多步路径
实体描述文本
knowledge graph embedding
random walks
self-attention mechanism
multiple-step path
entity description text