摘要
否定信息识别是将自然语言中的肯定信息与否定信息分离,它对信息检索、文本挖掘、情感分析等都有重要作用。该文主要对汉语否定信息中的触发词识别和覆盖域识别进行研究,采用双向长短期记忆网络结合条件随机场(BiLSTM-CRF)为模型,预训练的词向量为输入特征对触发词进行识别,在此基础上添加已知触发词特征对覆盖域进行识别。中文否定与不确定信息语料上,触发词识别取得F1值为91.03%,覆盖域识别在该语料的子语料财经新闻上取得F1值最高为73.91%。实验结果表明,这一模型在汉语否定触发词识别和覆盖域识别上取得的效果优于CRF模型和BiLSTM模型。
Negation recognition is to distinguish positive information from negative information in natural language,which is of substantial significance in information retrieval,text mining and emotion analysis.This paper investigates the cue detection and scope recognition in Chinese negative information by combining the BiLSTM(bidirectional long short-term memory network)and CRF(conditional random field)as BiLSTM-CRF.The pre-trained word embedding is input as features to detect the cue.And then the known cue features are added to define the scope.For Chinese negation and speculation information corpus,the cue detection reaches 91.03%in F1 value,and the scope recognition 73.91%(in the sub-corpus financial news only).The experimental results show that this proposed method is superior to the CRF model and the BiLSTM model in Chinese negative cue detection and scope recognition.
作者
陈世梅
伍星
唐凡
CHEN Shimei;WU Xing;TANG Fan(College of Computer Science,Chongqing University,Chongqing 400044,China;Ppdai Group Inc.,Shanghai 201210,China)
出处
《中文信息学报》
CSCD
北大核心
2018年第11期55-61,共7页
Journal of Chinese Information Processing
基金
国家自然科学基金(51608070)