摘要
反问句是以疑问的形式表达强烈情感的修辞方式,对其有效识别可为自然语言处理中的情感分析任务提供技术支持。该文提出了一种基于语言特征自动获取的反问句识别方法。首先,利用标签注意机制,建立了一个数据驱动的特征抽取模型,用于获取与任务相关的词汇、句法结构、符号标记和话题等语言特征。其次,利用Bi-LSTM模型分别对句子和语言特征进行表示,两者的交互注意被用于获取句子的各个词和符号的注意力权重向量。该权重向量作用于句子的表示,用于构建一个强化语言特征的反问句识别模型。在中文微博数据集上的实验结果表明,提出的方法与之前的工作相比,反问句识别性能有显著提升。
Rhetorical question is a rhetorical way to express strong emotion in the form of interrogative sentence, and the effective identification of it can provide the technical support to sentiment analysis in natural language processing. An identification method of rhetorical question based on automatic acquisition of language features is proposed in this paper. Firstly, a data-driven feature extraction model is established by using label attention mechanism to obtain the language features of word, syntactic structure, symbolic and topic from a sentence. Secondly, the target sentence and corresponding language features are expressed by Bi-LSTM model. On this basis, the interactive attention of the both is used to obtain the attention weight vector of words and symbolic flags in the target sentence. By making the attention weight vector act on the Bi-LSTM expression of the target sentence, a language feature strengthened rhetorical question identification model is established. Comparing with the previous works on a Chinese microblog data, the experimental results show that the proposed method significantly improved the performance of rhetorical question identification.
作者
李旸
吴卓嘉
王素格
梁吉业
LI Yang;WU Zhuojia;WANG Suge;LIANG Jiye(School of Computer&.Information Technology,Shanxi University,Taiyuan,Shanxi 030006,China;Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education,Shanxi University,Taiyuan,Shanxi 030006,China)
出处
《中文信息学报》
CSCD
北大核心
2020年第2期96-104,共9页
Journal of Chinese Information Processing
基金
国家自然科学基金(61632011,61573231,61432011,61672331)
山西省重点研发计划(201803D421024)。
关键词
反问句
特征抽取
注意力机制
识别模型
rhetorical questions
feature extraction
attention mechanism
identification model