期刊文献+

基于LSTM的生物医学核心实体提取模型研究 被引量:1

Research of Core Biomedical Entities Extraction Model Based on LSTM
下载PDF
导出
摘要 识别一篇生物医学文献中的核心实体是准确提取该文献信息的前提。针对目前生物医学文献实体识别和筛选方法的局限性,提出了基于LSTM的生物医学核心实体提取模型。该模型以LSTM为核心,通过更为优秀的词向量和输入生成规则改良模型输入,使用双向LSTM模型改进处理过程,将结果保存为树形结构并对该树进行合理剪枝获取标注链,实现输出结果处理,最终使实体识别的F1值达到了89.35%。此外,在核心实体筛选过程中,基于TF/IDF算法规则,充分考虑了词频、位置、逆文档频率等因素,使核心实体筛选的F1值达到了76.85%。 Identifying the core entities in a biomedical document is a prerequisite for accurate extraction of important information of the document.In view of the difficulties of entity and the limitations of existing methods of entity recognition and core entity screening in biomedical literature,a model of biomedical core entity extraction based on LSTM is proposed in this paper.The model takes LSTM as the core,applies the more excellent word vector and input generation rules to improve the model input,and employs the two-dimensional LSTM model to improve model of the process,The results are saved into the tree structure and reasonable pruning of the tree to achieve the output chain annotation way to obtain.Entity recognition F1 value reached 89.35%.In addition,in the process of core entity screening,the factors such as word frequency,location and inverse document frequency are fully taken into account on the premise of TF/IDF algorithm rules,and the F1 value of core entity screening is up to76.85%.
作者 唐颖 曹春萍 TANG Ying;CAO Chun-ping(University of Shanghai for Science and Technology School of Optical-Electrical and Computer Engineering,Shanghai 200093,China)
出处 《软件导刊》 2018年第5期132-137,共6页 Software Guide
基金 国家自然科学基金项目(61402288)
关键词 实体识别 改进词向量 双向LSTM 剪枝策略 核心实体筛选 entity recognition improved word vector bidirectional LSTM pruning strategy core entity screening
  • 相关文献

参考文献4

二级参考文献61

  • 1李晓萍,李欣欣,王丽,李耀芳,吴正治.医学论文中主题词的正确标引[J].深圳中西医结合杂志,2005,15(5):318-320. 被引量:7
  • 2秦东.CBMdisc主题标引一致性的探讨[J].现代情报,2006,26(1):95-96. 被引量:5
  • 3刘雪立.生物医学论文的结构式摘要及其写作.眼科新进展,2001,21(2):141-143.
  • 4PubMed[EB/OL].[2013-04-16] http://www.nlm.nih.gov/bsd/medline_ lang_ distr.html.
  • 5Bai Y,Qi D,Pu Q,et al.A Data Mining Algorithm based on Genetic Algorithm[M].The World Scientific and Engineering Academic Society (WSEAS),2004.
  • 6Jensen LJ,Saric J,Bork P.Literature Mining for the Biologist:from information retrieval to biological discovery[J].Nature Reviews Genetics,2006,7 (2):119-129.
  • 7Atkinson J,Bull V.A Multi-strategy Approach to Biological Named Entity Recognition[J].Expert Systems with Applications,2012,39 (17):12968-12974.
  • 8Rindflesch TC,Tanabe L,Weinstein JN,et al.EDGAR:extraction of drugs,genes and relations from the biomedical literature[C].Proceedings of the Pacific Symposium on Biocomputing Pacific Symposium on Biocomputing,2000.
  • 9Segura-Bedmar I,Martínez P,Segura-Bedmar M.Drug Name Recognition and Classification in Biomedical Texts:a case study outlining approaches underpinning automated systems[J].Drug Discovery Today,2008,13 (17-18):816-823.
  • 10Talukdar PP,Brants T,Liberman M,et al.Identification of New Drug Classification Terms in Textual Resources[J].Bioinformatics,2007,23 (13):264-272.

共引文献117

同被引文献1

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部