摘要
对大规模科技文献进行整理分析时,常常需要自动识别论文作者所归属的组织机构,此时需要将论文中的作者地址信息与对应的机构名称进行自动匹配。同一个机构的作者地址信息在不同的英文论文中可能出现多种不同的写法,这给匹配造成了困难。针对这一问题,设计出一种机器学习方法,此方法充分利用英文论文中作者地址的书写特点,在基于类中心向量的基础上将作者地址信息与机构名称进行自动匹配。与传统方法比较,该方法不需要手工编写烦琐的匹配规则,被应用于中国科学院作者地址信息数据集,实验结果证明了此方法的可行性。
When analyzing a large amount of scientific and technical literature, identification of the author's affiliation is always necessary. A key step in this task is matching the author 's address to the corresponding institution. Authors from one institution often state their affiliations in various forms in English. This causes string-matching methods to yield unsatisfactory results. In this paper, a machine learning method known as“class-center vectors”has been proposed to solve this problem according to the characteristics of the author's address. Compared with traditional methods, our method does not require matching rules to be written manually. The experimental results of Chinese Academy of Sciences (CAS) author's address data sets illustrate the feasibility of our method.
作者
何涛
王桂芳
马廷灿
He Tao;Wang Guifang;Ma Tingcan(Wuhan Documentation and Information Center, Chinese Academy of Sciences, Wuhan 430071)
出处
《情报学报》
CSSCI
CSCD
北大核心
2019年第7期716-721,共6页
Journal of the China Society for Scientific and Technical Information
基金
中国科学院青年创新促进会项目(2016160)
关键词
作者地址
机构名称
类中心向量
机器学习
author’s address
institution name
class-center vectors
machine learning