The pattern of thematic progression,reflecting the semantic relationships between contextual two sentences,is an important subject in discourse analysis.We introduce a new corpus of Chinese news discourses annotated w...The pattern of thematic progression,reflecting the semantic relationships between contextual two sentences,is an important subject in discourse analysis.We introduce a new corpus of Chinese news discourses annotated with thematic progression information and explore some computational methods to automatically extracting the discourse structural features of simplified thematic progression pattern(STPP)between contextual sentences in a text.Furthermore,these features are used in a hybrid approach to a major discourse analysis task,Chinese coreference resolution.This novel approach is built up via heuristic sieves and a machine learning method that comprehensively utilizes both the top-down STPP features and the bottom-up semantic features.Experimental results on the intersection of the CoNLL-2012 task shared dataset and the CDTC corpus demonstrate the effectiveness of our proposed approach.展开更多
基金This research has been supported by the National Natural Science Foundation of China under grant 61728205,61673290,61672371,61750110534,61876217Science&Technology Development Project of Suzhou under grant SYG201817.
文摘The pattern of thematic progression,reflecting the semantic relationships between contextual two sentences,is an important subject in discourse analysis.We introduce a new corpus of Chinese news discourses annotated with thematic progression information and explore some computational methods to automatically extracting the discourse structural features of simplified thematic progression pattern(STPP)between contextual sentences in a text.Furthermore,these features are used in a hybrid approach to a major discourse analysis task,Chinese coreference resolution.This novel approach is built up via heuristic sieves and a machine learning method that comprehensively utilizes both the top-down STPP features and the bottom-up semantic features.Experimental results on the intersection of the CoNLL-2012 task shared dataset and the CDTC corpus demonstrate the effectiveness of our proposed approach.