期刊文献+

基于机器学习的论文作者名消歧方法研究 被引量:8

Research on author name disambiguation method based on machine learning
下载PDF
导出
摘要 本文提出了一种基于规则匹配和机器学习的论文作者名自动化消歧方法:首先基于人工构建的人名匹配规则确定候选作者,对于存在多个候选人的情况,基于论文的属性信息(例如合作者、标题、摘要、关键词和出版物名称等)提取特征,然后选取合适的机器学习算法进行消歧.实验效果表明K近邻和Softmax分类器较适合于论文作者名消歧任务;此外,将作者信息与论文的其他信息分开提取特征能够有效提高作者名消歧的准确性. This paper proposes an automatic article author name disambiguation method based on rule matching and machine learning. For each article, the candidate authors are determined based on artificial constructed name matching rules firstly. For the cases of multiple candidates, features are extracted from the attribute information of the article, such as collaborators, title, abstract, key words and publication name, and then selected machine learning models are applied to author name disambiguating. The experimental results show that the K-nearest neighbor and Softmax classifier are more suitable for the author name disambiguation task than other models. In addition, extracting features of the authors information separatelycan from other information effectively improve the accuracy of the author namedisambiguation.
作者 邓可君 华凯 邓昌明 姜宁 袁玲 彭一明 张治坤 DENG Ke-Jun;HUA Kai;DENG Chang-Ming;JIANG Ning;YUAN Ling;PENG Yi-Ming;ZHANG Zhi-Kun(Computer Center, Peking University, Beijing 100871, China)
出处 《四川大学学报(自然科学版)》 CAS CSCD 北大核心 2019年第2期241-245,共5页 Journal of Sichuan University(Natural Science Edition)
关键词 作者名消歧 机器学习 文本特征提取 Author name disambiguation Machine learning Text feature extraction
  • 相关文献

参考文献5

二级参考文献23

  • 1张云涛,龚玲,王永成.基于综合方法的文本主题句的自动抽取[J].上海交通大学学报,2006,40(5):771-774. 被引量:16
  • 2蒲旭,王建勇,范小明.GHOST:作者名字排歧系统[J].计算机研究与发展,2010,47(s1):512—515.
  • 3Lee D,Kang J, Mitra P, Giles C L,et al. Are Your CitationsClean Comm[J]. ACM,2007,50(12) :33-38.
  • 4Ferreira A A,Goncalves M A, Laender A H F. A Brief Surveyof Automatic Methods for Author Name Disambiguation [ J].Acm Sigmod Record,2012, 41(2) : 15-26.
  • 5Han H,Giles C L, Zha H,et al. Two Sup^vised Learning Ap-proaches for Name Disambiguation in Author Citations [ C]. JC-DL,04 : Proceedings of the 4 th ACM/IEEE Joint Conference onDigital libraries,2004:296-305.
  • 6Torvik V I, Smalheiser N R. Author Name Disambiguation inMedline[ J]. Acm Transactions on Knowledge Discovery fromData,2009,3(11) :1-29.
  • 7Huang J, Ertekin S,Giles C L. Efficient Name Disambiguationfor Large-scale Databases[C]. Proceedings of the 10th EuropeanConference on Principles and Practice of Knowledge Discovery inDatabases, Berlin, Germany ,2006 : 536-544.
  • 8Han H,Zha H,Giles C L. Name Disambiguation in Author Cita-tions Using a k-way Spectral Clustering Method[C]. Joint Con-ference in Digital Libraries,2005:334-343.
  • 9Zhang D,Tang J, Li J. A Constraint-based Probabilistic Frame-work for Name Disambiguation[ J]. CIKM,07,2007: 1019 -1022.
  • 10Fan X M,Wang J Y ,Pu X, et al. On Graph-Based Name Dis-ambiguation[ J]. Journal of Data and Information Quality ,2011,2(2):23-56.

共引文献42

同被引文献55

引证文献8

二级引证文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部