摘要
为将稀疏分布式表征理论应用到著者姓名消歧,了解其在解决姓名消歧问题时的效果,提出了基于稀疏分布式表征的英文文献著者姓名消歧方法。该方法选择论文摘要文本信息作为消歧特征,将其生成二进制表示的SDR码。根据待消歧论文的SDR与同名作者的论文SDR相似度对比来实现著者姓名消歧。最终得到的结果为准确率98. 21%,召回率76. 75%,F值86. 17%,证明提出的消歧方法具有较好的效果。通过将该方法与利用合著者特征进行消歧的方法进行对比,说明该方法能够较好地解决文献著者姓名歧义问题。此外,该方法还可将作者未收录在作者库中的论文识别出来并将其指派给新作者,无须重新学习和更新模型。
In order to apply the sparse distributed representation theory to the author name disambiguation,and to know the effect of the theory in solving the name disambiguation problem,this paper proposed a method based on sparse distributed representation to disambiguate English author name. This paper selected summary as disambiguation feature and generated binary representation of SDRs. And then it constructed the similarity matrix based on the similarity comparison of the training set,it performed the experiment after the appropriate threshold set. The final accuracy is 98. 21%,the recall is 76. 75%,and the Fvalue is 86. 17%. The result indicates that the proposed method has a good effect. By comparing the proposed method with the method based on co-authors,it can be concluded that the proposed method can better solve the ambiguity problem of author names. In addition,the method can also identify the papers whose authors are not included in the author database,and assign to new authors without relearning and updating the model.
作者
翟晓瑞
韩红旗
张运良
李仲
Zhai Xiaorui;Han Hongqi;Zhang Yunliang;Li Zhong(Key Laboratory of Rich-media Knowledge Organization&Service of Digital Publishing Content,Institute of Scientific&Technical Information of China,Beijing 100038,China)
出处
《计算机应用研究》
CSCD
北大核心
2019年第12期3534-3538,共5页
Application Research of Computers
基金
国家自然科学基金资助项目(71473237)
中国工程科技知识中心建设项目(CKCEST-2018-1-26)
关键词
姓名消歧
稀疏分布式表征
语义指纹
层级时序记忆模型
name disambiguation
sparse distributed representation
semantic fingerprint
hierarchical temporal memory