摘要
为了降低不同学者实体之间的共享特征(如机构、发表会议等)给同名区分带来的影响,提出一种基于网络最大流的同名区分算法.该算法将论文实体及其特征融合成一张网络图,根据特征节点的被共享程度设定不同的容量,再计算论文节点间的最大流量,并基于最大流量进行层次聚类.实验结果表明:该算法在精准率和召回率上有较为均衡的表现,具有较好的综合性能.
In order to reduce the influence of sharing features(organizations,conferences,etc.)among different author entities on author name disambiguation,an algorithm based on network maximum flow is proposed in this paper.The algorithm puts the paper entities and features into a network graph,and sets the capacity of feature nodes based on the sharing degree.And then,it calculat es maximum flow bet ween each paper nodes and does clustering based on maximum flow.The experiment results show that the proposed algorithm has a more balanced performance on accuracy and recall,and has better overall performance.
作者
全锦琪
傅洛伊
甘小莺
王新兵
QUAN Jinqi;FULuoyi;GAN Xiaoying;WANG Xinbing(School of Electronic,Information and Electrical Engineering,Shanghai Jiao Tong University,Shanghai 200240,China)
出处
《上海交通大学学报》
EI
CAS
CSCD
北大核心
2020年第2期111-116,共6页
Journal of Shanghai Jiaotong University
关键词
同名区分
最大流
聚类
学术网络
name disam biguation
maximum flow
clustering
academic network