摘要
在生物医学文本挖掘领域,生物医学的命名实体和关系抽取具有重要意义。然而目前中文生物医学实体关系标注语料十分稀缺,这给中文生物医学领域的信息抽取任务带来许多挑战。该文基于深度学习技术搭建了中文生物医学实体关系抽取系统。首先利用公开的英文生物医学标注语料,结合翻译技术和人工标注方法构建了中文生物医学实体关系语料。然后在结合条件随机场(Conditional Random Fields, CRF)的双向长短期记忆网络(Bi-directional LSTM, BiLSTM)模型上加入了基于生物医学文本训练的中文ELMo (Embedding from Language Model)完成中文实体识别。最后使用结合注意力(Attention)机制的双向长短期记忆网络抽取实体间的关系。实验结果表明,该系统可以准确地从中文文本中抽取生物医学实体及实体间关系。
In the field of biomedical text mining, biomedical named entity recognition and relations extraction are of great significance. This paper builds a Chinese biomedical entity relation extraction system based on deep learning technology. Firstly, Chinese biomedical entity relation corpus is construction from the publicly available English biomedical annotated corpora via translation and manual annotation. Then this paper applies the ELMo(Embedding from Language Model) trained in Chinese biomedical text to the Bi-directional LSTM(BiLSTM) combined conditional random fields(CRF) model for Chinese entity recognition. Finally, the relation between entities is extracted using BiLSTM combined with the Attention mechanism. The experimental results show that the system can accurately extract biomedical entities and inter-entity relation from Chinese text.
作者
丁泽源
杨志豪
罗凌
王磊
张音
林鸿飞
王健
DING Zeyuan;YANG Zhihao;LUO Ling;WANG Lei;ZHANG Yin;LIN Hongfei;WANG Jian(School of Computer Science and Technology,Dalian University of Technology,Dalian,Liaoning 116024,China;Academy of Military Medical Sciences,Beijing 100850,China)
出处
《中文信息学报》
CSCD
北大核心
2021年第5期70-76,共7页
Journal of Chinese Information Processing
基金
国家重点研发计划项目(2016YFC0901902)。