期刊文献+

基于字符级特征自适应的生物医学命名实体识别 被引量:2

Biomedical Named Entity Recognition Based on Character Level Feature Adaptation
下载PDF
导出
摘要 生物医学领域新增实体数量和类型迅速增加,在预训练词表容量有限的情况下,字符嵌入可以在一定程度上解决未登录词问题,单一的字符级特征提取器所提取字符嵌入的潜在表征有一定局限性.针对此问题,提出一种字符级特征自适应融合的生物医学命名实体模型.首先利用卷积神经网络(CNN)和双向长短期记忆网络(BiLSTM)提取文本的字符向量,训练过程中动态计算文本单词两种字符向量的权重并进行拼接,使得模型在字符粒度上更加充分的利用信息,并加入词性信息和组块分析作为额外特征;将词向量、字符级特征和额外特征拼接后输入到BiLSTM-CRF神经网络模型进行训练.结果表明,所提模型在NCBI-disease和BiocreativeⅡGM语料库上平均F1值达到87.14%和81.04%,有效的提升了生物医学命名实体识别的效果. The number and types of new entities in the biomedical field are growing rapidly.With the limited capacity of the pre-training vocabulary,character embedding can solve the problem of the word out of vocabulary to a certain extent.The potential representation of character embedding extracted by a single character-level feature extractor has certain limitations.To solve this problem,a biomedical named entity model based on character level feature adaptive fusion is proposed.Firstly,the Convolutional Neural Network(CNN)and Bidirectional Long Short-Term Memory(BiLSTM)network are used to extract the character vector of the text.During the training process,the weight of two type of character vectors of the text word is dynamically calculated,and then splice the two types of character vectors,so that the model makes more full use of the information in the character granularity,and adds the part of speech information and chunking information as additional features;The pre-trained word vector,character level features and additional features are spliced and input into the BiLSTM-CRF neural network model for training.The results show that the average F1 value of the proposed model on NCBI-disease and BiocreativeⅡGM corpus is 87.14%and 81.04%,which effectively improves the effect of biomedical named entity recognition.
作者 于祥钦 王香 李智强 徐贤 YU Xiang-qin;WANG Xiang;LI Zhi-qiang;XU Xian(School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2023年第9期1876-1883,共8页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61872142,62072299,61772336,61702334,6172200,61173048,61557218)资助 上海市经济和信息委员会信息发展专项基金项目(201602008)资助 上海市浦江计划项目(17PJ1401900)资助 上海市自然科学基金项目(17ZR1406900,17ZR1429700)资助 ECUST教育研究基金项目(ZH1726108)资助.
关键词 生物医学命名实体识别 双向长短期记忆网络 卷积神经网络 字符级特征 自适应 biomedical named entity recognition Bidirectional Long Short-Term Memory Network(BiLSTM) Convolutional Neural Network(CNN) character level features self-adaption
  • 相关文献

参考文献1

共引文献120

同被引文献6

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部