摘要
篇章分析是自然语言处理领域的一个重要任务。分析篇章主次关系有助于理解篇章的结构和语义,并为自然语言处理的应用提供有力的支持。该文在微观篇章主次关系识别研究的基础上,重点研究宏观篇章主次关系,提出了一种基于word2vec和LDA的主题相似度的宏观篇章主次关系识别模型。基于word2vec的主题相似度和基于LDA的主题相似度在不同维度上计算语义相似度,两者在语义层面形成互补,因而增强了模型识别宏观篇章主次关系的能力。该模型在宏观汉语篇章树库(MCDTB)上实验的F1值达到79.9%,正确率达到81.82%,相较基准系统分别提升了1.7%和1.81%。
Discourse analysis is an important task in the field of natural language processing.The analysis of primary and secondary relations at discourse-level helps to understand the discourse structure and semantics.Based on the research of micro discourse-level primary and secondary relation recognition,this paper aims at macro discourse-level primary and secondary relation and provides a recognition model based on topic similarity with word2vec and LDA.The topic similarity based on word2vce and the topic similarity based on LDA calculate the semantic similarity on different dimensions.They are complementary at the semantic level,which enhances the ability of the model to recognize the macro discourse-level primary and secondary relations.Experimental results on the Macro Chinese Discourse TreeBank(MCDTB)show that our model achieves 79.9% in F1-score,and 81.82% in accuracy,which improves the baseline by 1.7% and 1.81% ,respectively.
作者
蒋峰
褚晓敏
徐昇
李培峰
朱巧明
JIANG Feng;CHU Xiaomin;XU Sheng;LI Peifeng;ZHU Qiaoming(School of Computer Sciences and Technology, Soochow University, Suzhou, Jiangsu 215006, China;Provincial Key Laboratory for Computer Information Processing Technology, Suzhou, Jiangsu 215006, China)
出处
《中文信息学报》
CSCD
北大核心
2018年第1期43-50,共8页
Journal of Chinese Information Processing
基金
国家自然科学基金(61773276
61472265
61772354)
江苏省科技计划(BK20151222)