摘要
组合原则表明句子的语义由其构成成分的语义按照一定规则组合而成,由此基于句法结构的语义组合计算一直是一个重要的探索方向,其中采用树结构的组合计算方法最具有代表性。但是该方法难以应用于大规模数据处理,主要问题是其语义组合的顺序依赖于具体树的结构,无法实现并行处理。该文提出一种基于图的依存句法分析和语义组合计算的联合框架,并借助复述识别任务训练语义组合模型和句法分析模型。一方面,图模型可以在训练和预测阶段采用并行处理,极大地缩短计算时间;另一方面,联合句法分析的语义组合框架不必依赖外部句法分析器,同时两个任务的联合学习可使语义表示同时学习句法结构和语义的上下文信息。我们在公开汉语复述识别数据集LCQMC上进行评测,实验结果显示准确率接近树结构组合方法,达到79.54%,预测速度最高可提升30倍以上。
The semantics of a sentence is composed of the meaning of its constituent components and the their combination.Therefore,syntax-based semantic composition serves as been an important research direction in NLP.The popular tree structure based method is difficult to be applied to large-scale data due to the dependent on the specific tree structure blocks parallel computation.In this paper,we present a joint framework for graph-based dependency parsing and semantic composition.Without relying on an external syntax parser the method applies the graph neural network for the semantic composition computation to support parallel computation.Moreover,the joint learning of two tasks enables the model to learn the syntactic structure and semantic contextual information simutaneously.Experimental results on LCQMC dataset show that the 79.54%accuracy is close to the tree-based semantics composition method,with the prediction speed increased by up to 30 times.
作者
汪凯
刘明童
张玉洁
陈圆梦
徐金安
陈钰枫
WANG Kai;LIU Mingtong;ZHANG Yujie;CHEN Yuanmeng;XU Jin'an;CHEN Yufeng(School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China)
出处
《中文信息学报》
CSCD
北大核心
2022年第7期24-32,共9页
Journal of Chinese Information Processing
基金
国家自然科学基金(61876198,61976015,61976016)。
关键词
句法分析
语义组合
图神经网络
复述识别
dependency parsing
semantic composition
graph neural network
paraphrase identification