摘要
同名排歧是实体分辨领域的重要研究内容之一,其旨在分辨出相同姓名对应的不同人。针对传统同名排歧方法需要丰富的信息以及无法解决信息缺乏时的排歧问题,提出了一种基于合作作者和隶属机构信息的同名排歧方法。根据作者间的合作关系以及作者与机构间的隶属关系构造实体关系图,采用广度优先搜索策略搜索图中两两同名作者间的有效路径;根据有效路径长度、数目及路径上边的类型,计算两个同名作者间的连接强度,并将其与阈值进行比较,实现同名排歧。实验结果表明,所提方法比当前最好的方法具有更好的同名排歧效果,且能够实现单一作者的同名排歧。
Name disambiguation is one of the most challenging issues in entity resolution domain,and it aims at solving the problem that the same name is shared by different people.However,most of the conventional approaches rely heavily on sufficient information of entities,and fail to realize the name identification with insufficient information.This paper proposesd a novel name disambiguation approach based on co-authors and authors’affiliates.Specifically,entity relationship diagram is constructed based on co-authorship and authors’affiliates,and the breadth-first search scheme is utilized to search the effective path between each pair of authors with the exactly same name in the constructed entity relationship diagram.A unique metric connection strength between authors is calculated according to the length of effective path,the number of effective path and the type of edge on path.And it is compared with the threshold to achieve name disambiguation.Experimental results show that the proposed approach is better than the state-of-the-art approaches,and it is able to disambiguate the authors sharing the same name without co-authorship.
作者
尚玉玲
曹建军
李红梅
郑奇斌
SHANG Yu-ling;CAO Jian-jun;LI Hong-mei;ZHENG Qi-bin(College of Command Information Systems,PLA University of Science and Technology,Nanjing 210007,China;The 63rd Research Institute,National University of Defense Technology,Nanjing 210007,China)
出处
《计算机科学》
CSCD
北大核心
2018年第11期220-225,260,共7页
Computer Science
基金
国家自然科学基金(61371196)
中国博士后科学基金(2015M582832)资助
关键词
数据质量
实体分辨
同名排歧
有效路径
连接强度
Data quality
Entity resolution
Name disambiguation
Effective path
Connection strength