摘要
源语言和目标语言的句法异构性对统计机器翻译(SMT)性能有重要影响。在基于短语的汉英统计机器翻译基础上,提出了一种基于N-best句法知识增强的源语言预调序方法。首先对源语言输入句子进行N-best句法分析,计算统计概率得到高可靠性子树结构,再根据词对齐信息从可靠性子树结构中抽取初始调序规则集。两种优化策略用于对初始规则集进行优化:基于中英文句法知识规则推导筛选和规则概率阈值控制机制。然后为减少短语内部调序,保证短语局部流利性,采用源语言短语翻译表为约束,使调序控制在短语块之间进行。最后根据获取的优化规则集和短语表约束条件对源语言端句子的句法分析树进行预调序。在基于NIST 2005和2008测试数据集上的汉英统计机器翻译实验结果表明,所提基于N-best句法知识增强的统计机器翻译预调序方法相对于基线系统,自动评价准则BLEU得分分别提高了0.68和0.83。
The syntactic heterogeneity between source and target languages has a significant impact on Statistical Machine Translation(SMT)performance. Based on the Chinese-English SMT system, an N-best syntactic knowledge enhanced method is proposed to pre-order the source-side sentences. Firstly, syntactic N-best parsed trees are generated, and highly reliable sub-trees are obtained by computing their posterior probabilities and then initial reordering rule set is extracted according to the word alignment links and sub-trees. Two optimization strategies are utilized to process the initial rule set,namely the bilingually syntactic knowledge-based and probability threshold-based. Secondly, in order to guarantee the local fluency of phrases, the phrase table is used to constrain the reordering only taking place between phrases rather than inside phrases. Finally, the optimized reordering rule set constrained by the phrase table is utilized to perform pre-reordering in source-side sentences. Experimental results on NIST 2005 and 2008 test sets show that the BLEU score improves 0.68 and 0.83 respectively compared to the baseline system.
作者
郭俊博
张喜媛
杜金华
GUO Junbo;ZHANG Xiyuan;DU Jinhua(Faculty of Higher Vocational and Technical Education, Xi’an University of Technology, Xi’an 710048, China;Faculty of Automation and Information Engineering, Xi’an University of Technology, Xi’an 710048, China)
出处
《计算机工程与应用》
CSCD
北大核心
2016年第17期160-165,176,共7页
Computer Engineering and Applications
基金
国家自然科学基金(No.61100085)
陕西省自然科学基金(No.2015JM6328)
关键词
统计机器翻译
预调序模型
N-best句法树
调序规则
规则优化
statistical machine translation
pre-reordering model
N-best parsed tree
reordering rules
rule optimization