摘要
依存树到串模型使用基于HDR片段的翻译规则。HDR片段是由中心词及其所有依存节点组成的树片段。这种翻译规则可以较好地捕捉语言中的句子模式和短语模式等组合现象,但在捕捉非组合现象(如习惯用语或固定搭配)方面存在不足。这类非组合现象易于由短语捕捉。为了更好地改善依存树到串模型的性能,本文提出了三种引入双语短语的方法,分别为引入句法短语、引入泛化句法短语及引入非句法短语。实验结果表明,同时使用句法短语、泛化句法短语及非句法短语时,可以将依存树到串模型的性能显著提高约1.0BLEU值。
Dependency-to-String model makes use of translation rules based on head-dependents relations, which con- sists of a head and all its dependents. This model is good at capturing sentence patterns and phrase patterns in the source language, but fails in capturing non-compositional phenomena(such as idiom and collocation)that can be cap- tured easily by phrases. In order to better improve the performance, we propose three ways to incorporate syntactic phrases, generalized syntactic phrases and non-syntactic phrases into this model. Experiments show that this model gains up to about 1.0 BLEU score by incorporating these three kinds of phrases.
出处
《中文信息学报》
CSCD
北大核心
2014年第2期44-50,共7页
Journal of Chinese Information Processing
基金
国家自然科学基金重点项目(60736014)
国家自然科学基金项目(60873167
90920004)
863重点项目(2011AA01A207)
关键词
统计机器翻译
依存树到串模型
泛化句法短语
非句法短语
statistical machine translation
Dependency-to-String Model
generalized syntactic bilingual phrases
non-syntactic bilingual phrases