摘要
命名实体识别是将自然语句中的姓名,地点,组织等实体抽取出来,是自然语言处理的一个上游任务。基于文档级记忆的命名实体识别是将所有识别过的语句信息融入当前待识别的语句中,从而加强当前句子的语义表达,以获得更好的识别效果。鉴于当前文档级记忆的命名实体识别都是将所有记忆信息混合融入当前语句中,忽略了不同标签类别的记忆信息对当前语句的影响不同,论文提出了一种融入分类记忆信息的中文命名实体识别方法,将当前输入语句与记忆模块中已按类别分成B、M、E、S四份的记忆信息利用注意力机制相匹配,找到每个字语义最相近的不同类别的若干个记忆字信息,将得到的记忆信息分别融入当前语句经过LSTM输出后得到的输出向量信息中,得到融入记忆信息的输出向量表示。这样可以更全面地表示当前字向量属于不同标签的可能。论文方法在中文命名实体识别经典数据集Resume简历数据集中得到了很好的实验效果。
Named entity recognition is to extract the name,place,organization and other entities from natural sentences,which is an upstream task of natural language processing.Named entity recognition based on document level memory is to integrate all the recognized sentence information into the current sentence to be recognized,so as to enhance the semantic expression of the current sentence and obtain better recognition effect.In view of the fact that the current document level memory of named entity rec⁃ognition is to mix all the memory information into the current statement,ignoring the different effects of the memory information of different label categories on the current statement,this paper proposes a Chinese named entity recognition method integrating the classified memory information.The current input sentence is matched with the memory information which has been divided into four parts of B,M,E,S according to the category in the memory module,and the attention mechanism is used to find several memory word information of different categories with the most similar semantics of each word.The memory information obtained is respective⁃ly integrated into the output vector information of the current sentence after LSTM output,and the output vector representation inte⁃grated with the memory information is obtained.In this way,the possibility that the current word vector belongs to different tags can be more comprehensively expressed.The proposed method has been applied to Resume dataset,a classic Chinese named entity rec⁃ognition dataset,and good experimental results have been obtained.
作者
王宝祥
陈渝
孙界平
琚生根
WANG Baoxiang;CHEN Yu;SUN Jieping;JU Shenggen(College of Computer Science,Sichuan University,Chengdu 610065;College of Science and Technology,Sichuan University for Nationalities,Kangding 626001)
出处
《计算机与数字工程》
2021年第12期2501-2508,共8页
Computer & Digital Engineering
关键词
命名实体识别
文档级记忆
分类记忆
named entity recognition
document-level memory
classified memory