Neural network based deep learning methods aim to learn representations of data and have produced state-of-the-art results in many natural language processing(NLP)tasks.Discourse parsing is an important research topic...Neural network based deep learning methods aim to learn representations of data and have produced state-of-the-art results in many natural language processing(NLP)tasks.Discourse parsing is an important research topic in discourse analysis,aiming to infer the discourse structure and model the coherence of a given text.This survey covers text-level discourse parsing,shallow discourse parsing and coherence assessment.We first introduce the basic concepts and traditional approaches,and then focus on recent advances in discourse structure oriented representation learning.We also introduce a trend of discourse structure aware representation learning that is to exploit discourse structures or discourse objectives for learning representations of sentences and documents for specific applications or for general purpose.Finally,we present a brief summary of the progress and discuss several future directions.展开更多
Discourse parsing is an important research area in natural language processing(NLP),which aims to parse the discourse structure of coherent sentences.In this survey,we introduce several different kinds of discourse pa...Discourse parsing is an important research area in natural language processing(NLP),which aims to parse the discourse structure of coherent sentences.In this survey,we introduce several different kinds of discourse parsing tasks,mainly including RST-style discourse parsing,PDTB-style discourse parsing,and discourse parsing for multiparty dialogue.For these tasks,we introduce the classical and recent existing methods,especially neural network approaches.After that,we describe the applications of discourse parsing for other NLP tasks,such as machine reading comprehension and sentiment analysis.Finally,we discuss the future trends of the task.展开更多
Early studies on discourse rhetorical structure parsing mainly adopt bottom-up approaches,limiting the parsing process to local information.Although current top-down parsers can better capture global information and h...Early studies on discourse rhetorical structure parsing mainly adopt bottom-up approaches,limiting the parsing process to local information.Although current top-down parsers can better capture global information and have achieved particular success,the importance of local and global information at various levels of discourse parsing is differ-ent.This paper argues that combining local and global information for discourse parsing is more sensible.To prove this,we introduce a top-down discourse parser with bidirectional representation learning capabilities.Existing corpora on Rhetorical Structure Theory(RST)are known to be much limited in size,which makes discourse parsing very challenging.To alleviate this problem,we leverage some boundary features and a data augmentation strategy to tap the potential of our parser.We use two methods for evaluation,and the experiments on the RST-DT corpus show that our parser can pri-marily improve the performance due to the effective combination of local and global information.The boundary features and the data augmentation strategy also play a role.Based on gold standard elementary discourse units(EDUs),our pars-er significantly advances the baseline systems in nuclearity detection,with the results on the other three indicators(span,relation,and full)being competitive.Based on automatically segmented EDUs,our parser still outperforms previous state-of-the-artwork.展开更多
The discourse analysis task,which focuses on understanding the semantics of long text spans,has received increasing attention in recent years.As a critical component of discourse analysis,discourse relation recognitio...The discourse analysis task,which focuses on understanding the semantics of long text spans,has received increasing attention in recent years.As a critical component of discourse analysis,discourse relation recognition aims to identify the rhetorical relations between adjacent discourse units(e.g.,clauses,sentences,and sentence groups),called arguments,in a document.Previous works focused on capturing the semantic interactions between arguments to recognize their discourse relations,ignoring important textual information in the surrounding contexts.However,in many cases,more than capturing semantic interactions from the texts of the two arguments are needed to identify their rhetorical relations,requiring mining more contextual clues.In this paper,we propose a method to convert the RST-style discourse trees in the training set into dependency-based trees and train a contextual evidence selector on these transformed structures.In this way,the selector can learn the ability to automatically pick critical textual information from the context(i.e.,as evidence)for arguments to assist in discriminating their relations.Then we encode the arguments concatenated with corresponding evidence to obtain the enhanced argument representations.Finally,we combine original and enhanced argument representations to recognize their relations.In addition,we introduce auxiliary tasks to guide the training of the evidence selector to strengthen its selection ability.The experimental results on the Chinese CDTB dataset show that our method outperforms several state-of-the-art baselines in both micro and macro F1 scores.展开更多
基金the National Natural Science Foundation of China(Grant Nos.61876113 and 61876112)the Beijing Natural Science Foundation(Grant No.4192017)+1 种基金the Support Project of High-level Teachers in Beijing Municipal Universities in the Period of 13th Five-year Plan(Grant No.CIT&TCD20170322)the Capacity Building for Sci-Tech Innovation-Fundamental Scientific Research Funds。
文摘Neural network based deep learning methods aim to learn representations of data and have produced state-of-the-art results in many natural language processing(NLP)tasks.Discourse parsing is an important research topic in discourse analysis,aiming to infer the discourse structure and model the coherence of a given text.This survey covers text-level discourse parsing,shallow discourse parsing and coherence assessment.We first introduce the basic concepts and traditional approaches,and then focus on recent advances in discourse structure oriented representation learning.We also introduce a trend of discourse structure aware representation learning that is to exploit discourse structures or discourse objectives for learning representations of sentences and documents for specific applications or for general purpose.Finally,we present a brief summary of the progress and discuss several future directions.
基金The research in this article is supported by the Science and Technology Innovation 2030-“New Generation Artificial Intelligence”Major Project(2018AA0101901)the National Key Research and Development Project(2018YFB1005103)+2 种基金the National Natural Science Foundation of China(Grant Nos.61772156 and 61976073)Shenzhen Foundational Research Funding(JCYJ20200109113441941)the Foundation of Heilongjiang Province(F2018013).
文摘Discourse parsing is an important research area in natural language processing(NLP),which aims to parse the discourse structure of coherent sentences.In this survey,we introduce several different kinds of discourse parsing tasks,mainly including RST-style discourse parsing,PDTB-style discourse parsing,and discourse parsing for multiparty dialogue.For these tasks,we introduce the classical and recent existing methods,especially neural network approaches.After that,we describe the applications of discourse parsing for other NLP tasks,such as machine reading comprehension and sentiment analysis.Finally,we discuss the future trends of the task.
基金supported by the National Natural Science Foundation of China under Grant No.62276178。
文摘Early studies on discourse rhetorical structure parsing mainly adopt bottom-up approaches,limiting the parsing process to local information.Although current top-down parsers can better capture global information and have achieved particular success,the importance of local and global information at various levels of discourse parsing is differ-ent.This paper argues that combining local and global information for discourse parsing is more sensible.To prove this,we introduce a top-down discourse parser with bidirectional representation learning capabilities.Existing corpora on Rhetorical Structure Theory(RST)are known to be much limited in size,which makes discourse parsing very challenging.To alleviate this problem,we leverage some boundary features and a data augmentation strategy to tap the potential of our parser.We use two methods for evaluation,and the experiments on the RST-DT corpus show that our parser can pri-marily improve the performance due to the effective combination of local and global information.The boundary features and the data augmentation strategy also play a role.Based on gold standard elementary discourse units(EDUs),our pars-er significantly advances the baseline systems in nuclearity detection,with the results on the other three indicators(span,relation,and full)being competitive.Based on automatically segmented EDUs,our parser still outperforms previous state-of-the-artwork.
基金supported by the National Natural Science Foundation of China(Grant Nos.61836007,61773276)the Priority Academic Program Development(PAPD)of Jiangsu Higher Education Institutions.
文摘The discourse analysis task,which focuses on understanding the semantics of long text spans,has received increasing attention in recent years.As a critical component of discourse analysis,discourse relation recognition aims to identify the rhetorical relations between adjacent discourse units(e.g.,clauses,sentences,and sentence groups),called arguments,in a document.Previous works focused on capturing the semantic interactions between arguments to recognize their discourse relations,ignoring important textual information in the surrounding contexts.However,in many cases,more than capturing semantic interactions from the texts of the two arguments are needed to identify their rhetorical relations,requiring mining more contextual clues.In this paper,we propose a method to convert the RST-style discourse trees in the training set into dependency-based trees and train a contextual evidence selector on these transformed structures.In this way,the selector can learn the ability to automatically pick critical textual information from the context(i.e.,as evidence)for arguments to assist in discriminating their relations.Then we encode the arguments concatenated with corresponding evidence to obtain the enhanced argument representations.Finally,we combine original and enhanced argument representations to recognize their relations.In addition,we introduce auxiliary tasks to guide the training of the evidence selector to strengthen its selection ability.The experimental results on the Chinese CDTB dataset show that our method outperforms several state-of-the-art baselines in both micro and macro F1 scores.