期刊文献+

基于概率分布差异的医学命名实体识别方法

Medical named entity recognition algorithm based on probability distribution difference
下载PDF
导出
摘要 医学命名实体识别是从医学文本中抽取出指代特定概念的医学实体,是医学信息抽取的基础性任务。当前主流的医学命名实体识别算法普遍基于深度学习技术,需要大量高质量的标注样本进行模型训练。然而医学领域的样本标注成本很高,严重限制了模型性能的提升。为了降低模型对标注样本的需求,一种重要的方法是基于主动学习思想,设计合理的样本采样策略,自动选取高价值样本优先标注,从而使模型提前收敛。现有算法普遍基于样本长度、样本识别的概率等特征来设计采样策略,忽视了样本类别分布这一深层次特征,导致命名实体识别召回率较低。提出了一种基于概率分布差异的主动学习算法,通过计算样本间的概率分布差异来评估样本的标注价值,并在标注样本更新时动态优化模型。在真实的医学检查文本上的实验表明,相比已有算法,达到同等的模型性能,该算法所需要的标注数据可缩减10%以上;在相同标注样本量的情况下,本算法F1值提高5%以上。 With the improvement of data abilities and the development of emerging technologies,there are profound changes occurring in economic patterns and competitive structure of industries.In order to better respond to future opportunities and challenges,and to improve competitiveness of enterprises in new situations,it is necessary to understand and master the knowledge of digital transformation.The new competitive situation was discussed in which traditional enterprises would gradually be replaced by digital-transformed ones,digital transformation was differentiated from digitalization.Main challenges facing traditional enterprises while undergoing digital transformation were pinpointed,which were the lack of funds,talents,data and consciousness.A digital transformation service platform oriented to new competitive situation was proposed,which provided a feasible solution to enhancing enterprise competitiveness and conducting digital transformation.
作者 刘聪 吕雪峰 王宏林 王晓伟 陆瑾 孙顺 胡松奇 LIU Cong;LYU Xuefeng;WANG Honglin;WANG Xiaowei;LU Jin;SUN Shun;HU Songqi(Information Center,Logistic Support Department of CMC,Beijing 100190,China;Changsha Civi-military Advanced Technology Research Limited Company,Changsha 410205,China)
出处 《大数据》 2023年第4期159-171,共13页 Big Data Research
基金 军队后勤科研重点项目(No.BS220R007)。
关键词 医学命名实体识别 深度学习 主动学习 概率分布 digital transformation emerging technologies data asset digital economy
  • 相关文献

参考文献6

二级参考文献85

  • 1万磊,佟鑫,盛明伟,秦洪德,唐松奇.Softmax分类器深度学习图像分类方法应用综述[J].导航与控制,2019,0(6):1-9. 被引量:58
  • 2TANABE L, WILBUR W J. A priority model for named entities [ C ]//Proc of Human Language Technology Conference. Morristown : Association for Computational Linguistics, 2006 : 33-40.
  • 3GU Bao-hua. Recognizing nested named entities in GENIA corpus [ C ]//Proc of Human Language Technology Conference. Morristown : Association for Computational Linguistics, 2006 : 112-113.
  • 4SUNDHEIM B M. Overview of results of the M UC-6 evaluation [ C ]// Proc of the 6th Conference on Message Under Standing. Morristown: Association for Computational Linguistics, 1996:423-442.
  • 5KIM J, OHTA T, TSURUOKA Y, et al. Introduction to the bio-entity recognition task at JNLPBA[ C ]//Proc of International Workshop on Natural Language Processing in Biomedicine and It's Applications. 2004 : 70 - 75.
  • 6YEH A, MORGAN A, COLOSIMO M, et al. BioCreAtIvE task 1A: gene mention finding evaluation[ J]. BMC Bioinformatics, 2005,6 (1) : S2.
  • 7LEAMAN R, GONZALEZ G. BANNER: an executable survey of advances in biomedical named entity recognition [ C ]//Proc of Pacific Symposium on Biocomputing. 2008:652-663.
  • 8KIM J D, OHTA T, TATEISI Y, et al. GENIA corpus:a semantically annotated corpus for bio-textmining [ J]. Bioinformatios, 2003, 19(1) : i180-i182.
  • 9TANABE L, XIE N, THOM L H, et al. GENETAG: a tagged corpus for gene/protein named entity recognition [ J]. BMC Bioinformatics, 2005,6( 1 ) : $3.
  • 10COHEN K B, FOX L, OGREN P V, et al. Corpus design for biomedical natural language processing [ C ]//Proc of ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Database. 2005,38-45.

共引文献84

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部