摘要
为了解决统计机器翻译语料对调序现象覆盖不足的问题,采用复述方法对语料进行扩展.提出了一种基于依存分析和句子生成的复述方法.对句子进行依存分析得到依存树,然后从依存树生成多个自然语言句子.生成的句子与原句相比没有词汇上的改变,但可以在词序方面进行变换.实验表明方法在不引入额外资源的前提下,有效缓解了语料覆盖不足的问题,提高了机器翻译质量.
To resolve the low-coverage problem of the statistic machine translation training corpus,a dependency parsing and sentence realization based paraphrasing method is proposed.The input sentence is first parsed into a dependency tree,and then the tree is realized into multiple natural language sentences.Although the generated sentences have the same lexical words,the expressions of word orders are re-arranged.The experiments shows that the paraphrasing method can be used to enlarge the bilingual corpus for statistic machine translation and the method efficiently relieves the low-coverage problem of training corpora without any extra resources,finally the translation quality is improved.
出处
《哈尔滨工业大学学报》
EI
CAS
CSCD
北大核心
2013年第5期45-50,共6页
Journal of Harbin Institute of Technology
基金
国家自然科学基金面上资助项目(61073126
61133012)
国家高技术研究发展计划重大资助项目(2011AA01A207)
关键词
复述
统计机器翻译
依存分析
句子生成
paraphrase
statistic machine translation
dependency parsing
sentence realization