
文献数据库中作者名消歧算法研究 被引量:7

Research on Author Name Disambiguation Algorithm in the Literature Database
摘要 在深入分析基于图的人名识别框架GHOST的基础上,针对其存在的局限性,结合对文献信息的文本挖掘提出一种更适用于文献数据库的作者名消歧算法,并从中选取标题以及出版物名称这两个特征进行实证研究,该算法在准确率、召回率等指标方面都有良好的表现,F1平均值达到84%,具备较好的消歧效果。 This paper firstly analyzes a graphical fi'amework for name disambiguation called GHOST, and then provides a modified name disambiguation algorithm combining with the text mining of literature information. The new algorithm is more suitable for literature database, making up for the limitations existed in GHOST. Based on selecting title and publication name as computing feature from the literature information, the experiment shows that the algorithm achieves high precision and recall value, and F1 reaches 84% , which is good enough for name disambiguation.
作者 郭舒
出处 《现代图书情报技术》 CSSCI 北大核心 2013年第7期69-74,共6页 New Technology of Library and Information Service
关键词 作者名消歧 GHOST 文本挖掘 消歧算法 Author name disambiguation GHOST Text mining Disambiguation algorithm
  • 相关文献


  • 1Han H, Giles L, Zha H, et al. Two Supervised Learning Approaches for Name Disambiguation in Author Citations [ C ]. In : Pro- ceedings of the 4th ACM/IEEE Joint Conference on Digital Libraries (JCDL '04). New York: ACM, 2004:296 -305.
  • 2Treeratpituk P, Giles C L. Disambiguating Authors in Academic Publications Using Random Forests[ C 1. In:Proceedings of the 9th ACM/IEEE- CS Joint Conference on Digital Libraries (JCDL' 09). New York: ACM,2009:39 -48.
  • 3Han H, Zha H, Giles C L. Name Disambiguation in Author Cita- tions Using a K - way Spectral Clustering Method [ C ]. In : Pro- ceedings of the 5th ACM/IEEE - CS Joint Conference on Digital Li- braries (JCDL' 05 ). New York : ACM, 2005:334 - 343.
  • 4Fan X M, Wang J Y, Pu X, et al. On Graph - based Name Dis- ambiguation[ J]. Journal of Data and Information Quality, 2011, 2(2) :23 -56.
  • 5Pereira D A, Ribeiro - Neto B, Ziviani N, et al. Using Web Infor- mation for Author Name Disambiguation [ C 1. In : Proceedings of the 9th ACM/IEEE - CS Joint International Conference on Digital Libraries ( JCDL' 09 ). New York : ACM, 2009:49 - 58.
  • 6Song Y, Huang J, Counci|l I G, et al. Efficient Topic - based Un- supervised Name Disambiguation [ C ]. In : Proceedings of the 7th ACM/IEEE - CS Joint Conference on Digital Libraries (JCDL' 07). New York : ACM, 2007:342 - 351.
  • 7蒲旭,王建勇,范小明.GHOST:作者名字排歧系统[J].计算机研究与发展,2010,47(s1):512—515.
  • 8DBLP EB/OL]. E2013 - 04 - 13 ]. http://www, informatik, uni - trier, de/- ley/db/index, html.
  • 9Lucene [ EB/OL ]. [ 2013 - 04 - 04 1. http://lucene, apache. org/.
  • 10Manning C D, Raghavan P, Sehitze H. Introduction to Information Retrieval[ M. New York: Cambridge University Press, 2008.


  • 1Bagga A, Baldwin B. Entity - based Cross - document Coreferencing Using the Vector Space Model [ C ]. In:Proceedings of the 17th In- ternational Conference on Computational Linguistics. 1998:75 -85.
  • 2Mann G S, Yarowsky D. Unsupervised Personal Name Disambigu- ation[C]. In: Proceedings of the 7th Conference on Natural Lan- guage Learning at HLT - NAACL 2003 ( CoNLL - 2003 ). 2003 : 33 -40.
  • 3Fleischman M B, Hovy E. Multi - Document PerSon Name Resolu- tion [ C ]. In : Proceedings of the 42nd Annual Meeting of the Associ- ation for Computational Linguistics, Reference Resolution Workshop. 2004.
  • 4Malin B. Unsupervised Name Disambiguation via Social Network Similarity[ C ]. In : Proceedings of the SIAM International Conference on Data Mining, Workshop on Link Analysis, Counterterrorism, and Security in Conjunction. 2005 : 93 - 102.
  • 5Tang J, Zhang J, Zhang D, et al. A Unified Framework for Name Disambiguation [ C ]. In : Proceedings of the 17th International Con- ference on World Wide Web. 2008 : 1205 - 1206.
  • 6Chen C, Hu J F, Wang H F. Clustering Technique in Multi - doc- ument Personal Name Disambiguation [ C ]. In : Proceedings of the ACL - IJNCLP 2009 Student Research Workshop, Suntex, Singaore. Stroudsburg, PA, USA : Association for Computational Linguistics, 2009 : 88 - 95.
  • 7ORCID. Welcome to ORCID [ EB/OL ]. [ 2012 - 03 - 02 ]. ht- tp ://about. orcid, org/.
  • 8Bagga A. Evaluation of Coreferences and Coreference Resolution Systems [ C ]. In : Proceedings of the 1 st International Conference on Language Resources and Evaluation. Granada: European Language Resources Association, 1998.
  • 9Zhang D, Tang J, Li J Z, et al. A Constraint - based Probabilistic Framework for Name Disambiguation [ C ]. In : Proceedings of the 16th ACM Conference on Information attd Knowledge Management ( CIKM' 2007 ). 2007 : 1019 - 1022.
  • 10Kang I S, Na S H, Lee S, et al. On Co - authorship for Author Dis- ambiguation[ J]. Information Processing & Management, 2009,45 (1): 84 -97.



  • 1余传明,钟韵辞,林奥琛,安璐.基于网络表示学习的作者重名消歧研究[J].数据分析与知识发现,2020,4(2):48-59. 被引量:10
  • 2曹犟,邬晓钧,夏云庆,郑方.基于拼音索引的中文模糊匹配算法[J].清华大学学报(自然科学版),2009(S1):1328-1332. 被引量:14
  • 3曹雷.面向专利战略的专利信息分析研究[J].科技管理研究,2005,25(3):97-100. 被引量:60
  • 4ExcelHome.汉字转拼音的完美解决方案[EB/OL].[2011-09-23].http://club.excelhome.net/thread-229924-1-1.html.
  • 5科学网.SCI转换工具[EB/OL].[2013-03-05].http://blog.sciencenet.cn/home.php?mod = space&uid=260374&do = blog &id=667402.
  • 6Lepak D P, Snell S A. Examining the human resource architecture: The relationships among human capital, employment and human resource configurations [ J ]. Journal of Management, 2002, 28(4): 517-543.
  • 7Moehrle M G, Walter L, Geritz A. Patent-based inventor profiles as a basis for human resource decisions in research and development [ J]. R & D Management, 2005, 35(5): 513-524.
  • 8Hoisl K. Tracing mobile inventors-the causality betweeninventor mobility and inventor productivity [ J]. Research Policy, 2007, 36(5): 619-636.
  • 9Paruchuri S,Nerkar A,Hambrick D C. Acquisition integration and productivity losses in the technical core: Disruption of inventors in acquired companies [ J ]. Organization Science, 2006, 17(5) : 545-562.
  • 10Lissoni F. Academic inventors as brokers [ J]. Research Policy, 2010, 39.. 843-857.










使用帮助 返回顶部