摘要
隐式篇章关系识别是篇章结构分析中最具有挑战性的任务之一.传统的方法注重篇章中的概念和意义特征,导致系统的性能不高.系统地探索了篇章中的浅层语义信息和以态度韵为导向的句子级情感等平面特征的有效性,同时提出了一种简单而有效的树核方法,最后采用复合核方法加以集成.在Penn Discourse Treebank(PDTB)2.0语料库上的实验结果表明,引入浅层语义和情感等信息后,准确率得到显著提升.
As a critical sub-task in discourse structure analysis, implicit discourse relation recognition (iDRR) is a challenging natural language processing task. Traditional approaches focus on exploring concepts and sense in discourse, which result in poor performance. This paper first systematically explores the efficiency of shallow semantic and attitude prosody-driven sentence-level sentiment information in discourse. Next, the paper proposes a simple but effective tree structure and finally investigates the efficiency of a composite kernel. Evaluation on Penn Discourse Treebank (PDTB) 2.0 shows the importance of shallow semantic and sentiment information across the discourse, and the appropriateness of the composite kernel in iDRR. It also shows that this system significantly outperforms other ones currently in the research field.
出处
《软件学报》
EI
CSCD
北大核心
2013年第5期1022-1035,共14页
Journal of Software
基金
国家自然科学基金(60970056
90920004)
国家高技术研究发展计划(863)(2012AA011102)
高等学校博士学科点专项科研基金(20093201110006)
江苏省自然科学基金(BK2011282)
江苏省高校自然基金(11KIJ520003)
江苏省普通高校研究生科研创新计划(CXZZ11_0101)
关键词
篇章
篇章结构分析
隐式篇章关系识别
树核
复合核
discourse
discourse structure analysis
implicit discourse relation recognition
tree kernel
composite kernel