摘要
针对短文本问句实体消歧中缺少实体描述信息和使用缩略词导致无法召回目标实体的问题,提出了一种融合多特征和由粗到精排序模型的短文本问句实体消歧方法。首先,使用N-Gram分词模型辅助召回候选实体,然后选取候选实体在知识图谱中的关系和相邻实体,分别计算与问句的相似度,作为实体在知识图谱中的描述信息,结合实体重要性等多个特征进行特征拟合;最后,通过粗排模型减少候选实体集合的数量,再经过精排模型排序得到最终的目标实体。在CCKS2019-CKBQA的数据集上的实体消歧实验表明,本文模型的准确率达到91.35%。
In order to solve the problem of missing entity description information and the inability to recall target entity by the use of abbreviations,an entity disambiguation method for short text questions was proposed that incorporate multiple features and a coarse-to-fine ranking model.First,the N-Gram word separation model is used to assist in the recall of candidate entities.Then the relationships and neighboring entities of the candidate entities are selected in the knowledge graph.The similarity with the interrogative sentences is calculated respectively,which are used as the description information of the entities in the knowledge graph,and combined with multiple features such as entity importance for feature fitting.Finally,the coarse ranking model is used to reduce the number of candidate entities set and then sorted by the fine ranking model to get the final target entities.The proposed method was evaluated on the dataset of CCKS2019-CKBQA.Experimental results showed that the proposed method reaches an accuracy of 91.35%.
作者
王荣坤
宾晟
孙更新
WANG Rong-kun;BIN Sheng;SUN Geng-xin(College of Computer Science & Technology, Qingdao University, Qingdao 266071, China)
出处
《青岛大学学报(自然科学版)》
CAS
2022年第3期16-21,共6页
Journal of Qingdao University(Natural Science Edition)
基金
教育部人文社会科学研究青年项目(批准号:15YJC860001)资助
山东省自然基金(批准号:ZR2017MG011)资助
山东省社会科学规划项目(批准号:17CHLJ16)资助。
关键词
实体消歧
短文本问句
特征融合
CKBQA
排序模型
知识图谱
entity disambiguation
short textual question
incorporates multiple features
CKBQA
ranking model
knowledge graph