摘要
在医疗命名实体识别中,由于存在大量医学专业术语和语料中语言不规范的原因,识别的准确率不高。为了识别未登录的医学术语和应对语言不规范问题,提出一种基于N-grams新词发现的Lattice-LSTM的多粒度命名实体识别模型。在医疗对话语料中使用N-grams算法提取新词并构造一个医疗相关的词典,通过Lattice-LSTM模型将输入的字符和所有能在词典匹配的单词一起编码,其中门结构能够使模型选择最相关的字符和单词。Lattice-LSTM能够利用发现的新词信息识别未登录的医学术语,从而得到更好的实验识别结果。
In medical named entity recognition,the accuracy of recognition is not high because there are a large number of medical terms and non-standard language in corpus.In order to identify unregistered medical terms and deal with the problem of non-standard language,we propose a Lattice-LSTM multi-granularity named entity recognition model based on N-grams new words discovery.The N-grams algorithm was used to extract new words from medical conversation corpus and construct a medical-related dictionary.Lattice-LSTM model was used to encode the input characters together with all the words matched in the dictionary.The gate structure enabled the model to select the most relevant characters and words.Lattice-LSTM can use the information of new words to identify unregistered medical terms,so as to get better experimental recognition results.
作者
赵耀全
车超
张强
Zhao Yaoquan;Che Chao;Zhang Qiang(National and Local Joint Engineering Laboratory of Computer Aided Design,Dalian University,Dalian 116622,Liaoning,China)
出处
《计算机应用与软件》
北大核心
2021年第1期161-165,249,共6页
Computer Applications and Software
基金
国家自然科学基金项目(61751203)
大连市科技创新基金项目(2018J12GX036)
大连市高层次人才创新支持计划项目(2017RD11)。