摘要
生物医学事件抽取是生物医学文本信息抽取中最重要的、也是最富有挑战性的任务之一,近年来得到了广泛关注。生物医学事件抽取中最重要的2个子任务为触发词识别和事件要素检测。已有的大部分方法将触发词识别作为分类任务,忽略了句子级标签信息。构建基于长短时记忆神经网络与条件随机场的序列标注模型用于触发词识别,分别将组合字符级词表示的静态预训练词向量和基于预训练语言模型的动态语境词表示作为模型输入;同时,针对事件要素检测任务,充分利用实体以及实体类型特征,提出基于自注意力的多分类模型。最终触发词识别F1值为81.65%,整体事件抽取F1值为60.04%,实验结果表明提出的方法对于生物医学事件抽取是有效的。
Biomedical event extraction is one of the most significant and challenging tasks in biome-dical text information extraction,which has attracted more attentions in recent years.The two most important subtasks in biomedical event extraction are trigger recognition and argument detection.Most of the preceding methods consider trigger recognition as a classification task but ignore the sentence-level tag information.Therefore,a sequence labeling model based on bidirectional long short-term memory(Bi-LSTM)and conditional random field(CRF)is constructed for trigger recognition,which separately uses the static pre-trained word embedding combined with character-level word representation and the dynamic contextual word representation based on the pre-trained language model as model inputs.Meanwhile,for the event argument detection task,a self-attention based multi-classification model is proposed to make full use of the entity and entity type features.The F1-scores of trigger recognition and overall event extraction are 81.65%and 60.04%respectively,and the experimental results show that the proposed method is effective for biomedical event extraction.
作者
魏优
刘茂福
胡慧君
WEI You;LIU Mao-fu;HU Hui-jun(School of Computer Science and Technology,Wuhan University of Science and Technology,Wuhan 430065;Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System,Wuhan 430065,China)
出处
《计算机工程与科学》
CSCD
北大核心
2020年第9期1670-1679,共10页
Computer Engineering & Science
基金
国家社会科学基金(11&ZD189)
湖北省教育厅人文社会科学研究项目(17Y018)
关键词
生物医学事件抽取
序列标注
语境词表示
自注意力
biomedical event extraction
sequence labeling
contextual word representation
self-attention