摘要
实体消歧作为知识库构建、信息检索等应用的重要支撑技术,在自然语言处理领域有着重要的作用。然而在短文本环境中,对实体的上下文特征进行建模的传统消歧方式很难提取到足够多用以消歧的特征。针对短文本的特点,提出一种基于实体主题关系的中文短文本图模型消歧方法,首先,通过TextRank算法对知识库信息构建的语料库进行主题推断,并使用主题推断的结果作为实体间关系的表示;然后,结合基于BERT的语义匹配模型给出的消歧评分对待消歧文本构建消歧网络图;最终,通过搜索排序得出最后的消歧结果。使用CCKS2020短文本实体链接任务提供的数据集对所提方法进行评测,实验结果表明,该方法对短文本的实体消歧效果优于其他方法,能有效解决在缺乏知识库实体关系情况下的中文短文本实体消歧问题。
As an important supporting technology for applications such as knowledge base construction and information retrieval,entity disambiguation plays an important role in the field of Natural Language Processing(NLP).However,in the short text environment,it is difficult for entity disambiguation to extract sufficient context features for disambiguation.Aiming at the characteristics of short texts,this paper proposes a disambiguation method of graph models based on entity topic relations.This method uses TextRank algorithm to infer the topic of corpus constructed by knowledge base information,and uses the result of topic inference as the representation of relationship between entities.By combining the disambiguation score given by the semantic matching model based on BERT,the disambiguation network graph is constructed,and the final disambiguation result is obtained through search and sorting.The data set provided in the short text entity link task of CCKS2020 is used to evaluate the method.The experimental results show that the proposed method is better than other entity linking methods in entity disambiguation of short text,and can effectively solve the entity disambiguation problem of Chinese short text.
作者
马瑛超
张晓滨
MA Ying-chao;ZHANG Xiao-bin(School of Computer Science,Xi’an Polytechnic University,Xi’an 710048,China)
出处
《计算机工程与科学》
CSCD
北大核心
2023年第1期154-162,共9页
Computer Engineering & Science
基金
陕西省自然科学基金(2019JQ-849)
西安工程大学研究生创新基金(chx2021028)。