摘要
从图同构角度给出树同构的性质,并阐述了结构异构与结构对齐之间的关系。在此基础上为建立结构映射关系,以及在翻译过程中融入句法结构信息,提出元结构、互译结构组概念及多层次结构对齐的体系。最后利用对数线性模型,给出基于元结构对齐的统计机器翻译模型。模型的翻译过程中,源语言句法树以元结构为单位进行分解,利用互译结构组映射知识,转换为目标语言句法树结构序列,从而根据结构模型信息对目标语实施调序和译文的生成。实验结果表明,本模型在对于翻译知识的泛化能力和翻译结果方面都优于基于短语的统计机器翻译模型。
To deal with the structure divergence and introduce syntactic knowledge into statistical machine translation, firstly, some definitions of recta-structure, concomitancy-sequence, and reconstructed-structure were presented for the parse tree. Alignments based on different levels could be acquired with the mapping proposed. A novel translation model based on these definitions was presented in the theory of log-linear model. During the process of translation, the parse tree was decomposed, reconstructed and transformed into the target ones. Experiment shows that generative ability and translation results of this model outperform the baseline.
出处
《通信学报》
EI
CSCD
北大核心
2009年第7期124-129,共6页
Journal on Communications
基金
国家自然科学基金资助项目(60736014)
国家高技术研究发展计划("863"计划)基金资助项目(2006AA010208)~~
关键词
统计机器翻译
结构异构
结构对齐
对数线性模型
statistical machine translation
structure divergence
structure alignment
log-linear model