摘要
命名实体识别(named entity recognition,NER)是自然语言处理中重要的基础任务,而中文命名实体识别(Chinese named entity recognition,CNER)因分词歧义和一词多义等问题使其尤显困难。针对这些问题,提出多头注意力机制(multi-heads attention mechanism,Multi-Attention)与字词融合的中文命名实体识别模型(CWA-CNER)。将汉语文本字向量与其在句中可能成词的词向量进行拼接,并将其送入长短时记忆网络(bidirectional long short-term memory neural network,BiLSTM)提取上下文语义信息,进而利用多头注意力机制捕获句中元素间联系的紧密程度,最后通过条件随机场(conditional random field,CRF)进行实体标注。该模型在Boson数据集,1998和2014年《人民日报》三种语料上进行实验,其F1值均达到90%以上,结果表明了模型的有效性。
Named entity recognition(NER)is an important basic task in natural language processing and Chinese named entity recognition(CNER)is particularly difficult because of word segmentation ambiguity and polysemy.To solve these problems,a multi-heads attention mechanism(Multi-Attention)and character and words fusion CNER model is proposed.The model is abbreviated as CWA-CNER.Firstly,the character vector and its words vector are connected together.The words are the possible words containing the character in the sentence.Then the connected vector are input into bidirectional long short-term memory(BiLSTM)neural network to further extract contextual semantic information.Secondly,Multi-Attention is used to capture the tightness of the connection between elements in the sentence,and finally the entity labeling is carried out through conditional random field(CRF).The model is tested on Boson dataset,1998 and 2014 People’s Daily corpus,and their F1 values are all more than 90%.The results show that the model is effective.
作者
赵丹丹
黄德根
孟佳娜
谷丰
张攀
ZHAO Dandan;HUANG Degen;MENG Jiana;GU Feng;ZHANG Pan(School of Computer Science and Technology,Dalian University of Technology,Dalian,Liaoning 116024,China;School of Computer Science and Engineering,Dalian Minzu University,Dalian,Liaoning 116600,China)
出处
《计算机工程与应用》
CSCD
北大核心
2022年第7期142-149,共8页
Computer Engineering and Applications
基金
国家科技部科技创新2030—“新一代人工智能”重大项目(2020AAA0108004)
国家自然科学基金(U1936109,61876031)
辽宁省教育厅2019年度科学研究经费项目(LJYT201906)。
关键词
命名实体识别(NER)
多头注意力机制
字词融合
named entity recognition(NER)
multi-heads attention mechanism
character and words fusion