期刊文献+

基于实体识别和信息融合的知识图谱研究——以新冠肺炎疫情为例 被引量:1

Knowledge Graph Based on Entity Recognition and Information Fusion--A Case Study of COVID-19
下载PDF
导出
摘要 突发公共卫生事件通常会造成巨大的破坏,研究时效性与可理解性在解决这类事件中尤为重要,亟需快速分析研究现状、抽取特定研究信息的方法。科学文献是知识传播的主要载体与重要途径之一,针对文献中专业术语特殊性与歧义性导致的传播受阻问题,该文通过自然语言处理与知识图谱技术,以新冠疫情研究相关文献为例,结合实体识别与信息融合构建知识图谱。该方法首先通过对文献的题目与摘要标注实体以构建数据集用于训练BERT-BiLSTM-CRF模型,该模型可以对文本中的医学实体自动识别并提取。然后根据作者信息的多源交叉验证与领域、机构相似度消除作者姓名歧义并构建一个作者集合。最后根据实体-实体、作者-作者和实体-作者关系,在融合多源信息后增量构建新冠肺炎疫情知识图谱。命名实体识别模型在6类不同医学实体上的平均F1分数达到92.86%,知识图谱包含了34 802个医学实体与397 163名作者。这项研究表明以上流程可以有效地构建知识图谱,并据此快速找到前沿研究热点和相关领域核心学者,有效促进知识的获取和概念的传播。 Public health emergencies usually cause great damage. Timeliness and comprehensibility of research are particularly important in solving such incidents. It is urgent to analyze the current situation of research quickly and extract specific research information. Scientific literature is one of the main carriers and important ways of knowledge dissemination. In view of the problem of transmission obstruction caused by the special terminology and ambiguity in the literature, we use natural language processing and knowledge graph technology, and take COVID-19 as an example to build knowledge graph with recognized entities and fused information. Firstly, the method labels the entities of the title and abstract of the literature to construct a data set for training the BERT-BiLSTM-CRF model, which can automatically recognize and extract the medical entities in the papers. Then, according to the multi-source cross validation of author information and the similarity of domain and organization, the author name ambiguity is eliminated and an author information set is constructed. Finally, a knowledge graph about COVID-19 is constructed after the integration of multiple sources information based on entity-entity, author-author and entity-author relationships. The average F1 score of the entity recognition model on 6 different medical entities reached 92.86%. The knowledge graph contains 34 802 medical entities and 397 163 authors. This study shows that this process can effectively construct the knowledge graph, quickly find cutting-edge research hotspots and core scholars in related fields, which effectively promote the acquisition of knowledge and the dissemination of concepts.
作者 刘华玲 孙毅 LIU Hua-ling;SUN Yi(Department of Statistics and Information,Shanghai University of International Business and Economics,Shanghai 201620,China)
出处 《计算机技术与发展》 2022年第9期107-113,共7页 Computer Technology and Development
基金 上海哲学社会科学规划课题(2018BJB023) 国家社会科学重大课题(16ZDA055)。
关键词 命名实体识别 实体消歧 BERT 知识图谱 新冠肺炎疫情 可视化分析 named entity recognition entity disambiguation BERT knowledge graph COVID-19 visualization analysis
  • 相关文献

参考文献11

二级参考文献110

  • 1余传明,钟韵辞,林奥琛,安璐.基于网络表示学习的作者重名消歧研究[J].数据分析与知识发现,2020,4(2):48-59. 被引量:10
  • 2周义刚.学术社交网络:改变我们的科研方式[J].中国教育网络,2014(2):97-98. 被引量:4
  • 3互动百科.社交网络[EB/OL].[2012-09-17].http://www.baike.com/wiki/%E7%A4%BE%E4%BA%A4%E7%BD%91%E7%BB%9C.
  • 4Torvik V I, Smalheiser N R. Author Name Disambiguation in MEDLINE [J]. ACM Transactions on Knowledge Discovery from Data, 2009, 3(3): Article No. 11. [2014-08-15]. http://dl. acm.org/citation.c fm?id= 1552304.
  • 5Nature. Credit Where Credit is Due [J/OL]. Nature, 2009, 462(825). DOI:10.1038/462825a. [2014-08-15]. http://www. nature.eorn/nature/joumal/v462/n7275/full/462825 a.html.
  • 6ORCID [EB/OL]. [2015-03-05]. https://orcid.org/statistics.
  • 7Qiu J. Scientific Publishing: Identity Crisis [J]. Nature, 2008, 451(7180): 766-767.
  • 8Aerts R. Digital Identifiers Work for Articles, So Why not for Authors? [J]. Nature, 2008, 453(7198): 979.
  • 9Lane J. Let's Make Science Metrics More Scientific [J]. Nature, 2010, 464(7288): 488-489.
  • 10Enserink M. Are You Ready to Become a Number [J]. Science, 2009, 323(15922): 1662-1664.

共引文献191

同被引文献12

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部