期刊文献+

基于多任务和迁移学习的中文医学文献实体识别研究 被引量:2

Recognizing Chinese Medical Literature Entities Based on Multi-Task and Transfer Learning
原文传递
导出
摘要 【目的】利用迁移学习和多任务学习解决中文医学文献实体识别冷启动和边界定位难的问题,进一步提高识别准确性。【方法】提出一种基于迁移学习和多任务学习的中文医学文献实体识别方法,构建混合深度学习BERT-BiLSTM-IDCNN-CRF的医学文献实体识别模型,通过实例迁移、模型迁移和特征迁移丰富医学语义特征,利用多任务学习构建粗粒度三分类任务以辅助实体识别任务有效利用实体边界信息,最后引入自注意力机制和Highway网络捕获全局重要信息并优化深层网络训练,提出TLMT-BBIC-HS模型。【结果】TLMT-BBIC-HS模型在中文糖尿病医学文献数据集上F1值达92.98%,较基准模型BERT-BiLSTM-CRF和BERT-IDCNN-CRF分别提高15.99个百分点和16.44个百分点。【局限】未验证模型的领域适应性。【结论】TLMT-BBIC-HS模型可实现医学知识的迁移共享,更适用于中文医学文献实体识别任务,可为医疗健康信息抽取、知识图谱和问答系统构建提供有效支持。 [Objective]This paper uses transfer learning and multi-task learning to solve the problems of cold start and boundary in Chinese medical literature entity recognition,and further improve the recognition accuracy.[Methods]Firstly,we constructed a hybrid deep learning BERT-BiLSTM-IDCNN-CRF medical literature entity recognition model.Secondly,based on transfer learning,the medical semantic features were enriched through instance,model and feature transfer.Thirdly,we constructed a coarse-grained three-classification task through multi-task learning to assist the main task in utilizing the entity boundary information effectively.Finally,we introduced the self-attention mechanism and highway network to capture global information,optimize deep network training and establish the TLMT-BBIC-HS model.[Results]The model had an F1 value of 92.98%on the Chinese diabetes medical literature dataset,which is 15.99%and 16.44%higher than the benchmark models BERT-BiLSTM-CRF and BERT-IDCNN-CRF.[Limitations]The domain suitability of this model needs to be verified.[Conclusions]The TLMT-BBIC-HS model can transfer and share medical knowledge,which is more suitable for Chinese medical Literature entity recognition.It could effectively extract medical information and construct knowledge graphs and question answering systems.
作者 韩普 顾亮 叶东宇 陈文祺 Han Pu;Gu Liang;Ye Dongyu;Chen Wenqi(School of Management,Nanjing University of Posts&Telecommunications,Nanjing 210003,China;Jiangsu Provincial Key Laboratory of Data Engineering and Knowledge Service,Nanjing 210023,China)
出处 《数据分析与知识发现》 CSSCI CSCD 北大核心 2023年第9期136-145,共10页 Data Analysis and Knowledge Discovery
基金 国家社会科学基金项目(项目编号:22BTQ096)的研究成果之一。
关键词 医学文献实体识别 多任务学习 迁移学习 注意力机制 Highway网络 Medical Literature Entity Extraction Multi-Task Learning Transfer Learning Attention Mechanism Highway Network
  • 相关文献

参考文献7

二级参考文献73

共引文献97

同被引文献45

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部