摘要
疾病命名实体识别是生物医学领域文本挖掘的最基础任务之一。基于当前流行的深度学习方法,本文采用BiLSTM-CNN-CRF模型来识别生物医学文献中的疾病命名实体。该模型首先用卷积神经网络(CNN)来获取字符级的词向量表示,然后利用双向长短时记忆网络(BiLSTM)来获取单词的隐含表示,最后使用条件随机场(CRF)模型输出疾病实体的标签。实验结果表明,与传统模型相比,深度学习方法在疾病命名实体识别任务上有显著的优势,最终该模型在NCBI语料库上的取得84.47%的F1值。
Disease named entity recognition is one of the fundamental tasks in text mining in biomedical domain.Based on current popular deep learning methods, a BiLSTM-CRF model is built in this paper to perform disease named entity recognition. A Convolutional Neural Network(CNN) is first applied on character level to acquire embedding representation of a word. The hidden representation of each word in a sentence is modeled by a Bidirectional Long Short-Term Memory(BiLSTM) model. Finally, Conditional Random Field(CRF) model is used to generate entity labels. Results show that deep learning based approaches perform much better than traditional methods. Our model obtained an F1-score of 84.47% on NCBI dataset.
作者
袁源
何云琪
钱龙华
YUAN Yuan;HE Yunqi;QIAN Longhua(School of Computer Science&Technology,Soochow University,Suzhou,China,215006)
出处
《福建电脑》
2019年第3期39-42,共4页
Journal of Fujian Computer
关键词
疾病实体识别
文本挖掘
深度学习
Disease Named Entity Recognition
Text Mining
Deep Learning