期刊文献+

基于机器阅读理解的BiLSTM-BiDAF命名实体识别 被引量:2

BiLSTM-BiDAF Named Entity Recognition Based on Machine Reading Comprehension
下载PDF
导出
摘要 命名实体识别是自然语言处理的一项基本任务,对信息提取、机器翻译等具有重要的意义和价值。目前命名实体识别通常使用序列标注方法对文本中单个句子的实体进行抽取,忽略了句子间的语义信息。基于机器阅读理解的命名实体识别方法借助问题编码了实体类别的重要先验信息,更加容易区分出相似的分类标签,降低了模型学习难度,但仍然只在句子级别建模,忽略了句子间的语义信息,容易造成不同句子中实体标注不一致的问题。为此,文中将句子级别的命名实体识别扩展到文本级别的命名实体识别,提出了一种基于机器阅读理解的BiLSTM-BiDAF命名实体识别模型。首先,为了充分挖掘文本的上下文特征,使用NEZHA获取全文语境信息,并进一步通过BiLSTM提取局部特征,以加强模型对局部依赖信息的捕获能力;然后,引入双向注意力机制学习文本与实体类别之间的语义关联;最后,设计基于门控机制的边界检测器加强实体边界的相关关系,预测出实体在文本中的位置,同时通过建立答案数量检测器,将无答案问题识别出来。在CCKS2020中文电子病历数据集和CMeEE数据集上的实验结果表明,文中构建的模型能有效地识别文本中的命名实体,F_(1)值可分别达到84.76%和57.35%。 Named entity recognition is a fundamental task of natural language processing(NLP)and plays an important role in many downstream NLP tasks,including information extraction and machine translation,etc.The existing named entity recognition methods are usually based on sequence labeling and extract entities within a sentence independently.These methods ignore the semantic information between sentences.Named entity recognition methods based on machine reading comprehension encode important prior information about the entity class.It is easier to distinguish similar classification labels,which reduces the difficulty of model learning,but it still only models at the sentence level,ignoring the semantic information between sentences,which is easy to cause the problem of inconsistent entity labeling in different sentences.To this end,this paper extended the sentence-level named entity recognition to the text-level named entity recognition,and then proposed a BiLSTM-BiDAF named entity recognition model based on machine reading comprehension.First,to utilize the context information within the whole text,NEZHA pre-training language model was used to obtain information of the full text and local features were further captured through BiLSTM,so as to strengthen the model’s ability to capture locally dependent information.Then,a bidirectional attention flow was introduce to learn the semantic association between the text and entity category.Finally,to predict the position of entities in the text,a boundary detector based on the gating mechanism was design to strengthen the correlation of the entity boundary.At the same time,an answer count detector was establish to identify the unanswerable questions.Experimental results on the CCKS2020 Chinese electronic medical records dataset and CMeEE dataset show that our model can effectively identify document-level and sentence-level named entities,and F_(1)can reach 84.76%and 57.35%,respectively.
作者 王洁 夏晓明 WANG Jie;XIA Xiaoming(Faculty of Information Technology,Beijing University of Technology,Beijing 100124,China)
出处 《华南理工大学学报(自然科学版)》 EI CAS CSCD 北大核心 2022年第12期80-88,共9页 Journal of South China University of Technology(Natural Science Edition)
基金 国家自然科学基金资助项目(61876010)。
关键词 双向注意力机制 双向长短时记忆网络 命名实体识别 机器阅读理解 自然语言处理 bidirectional attention flow bidirectional long short-term memory named entity recognition machine reading comprehension natural language processing
  • 相关文献

参考文献2

二级参考文献4

共引文献17

同被引文献29

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部