期刊文献+

神经机器翻译面对句长敏感问题的研究 被引量:1

Research on Sentence Length Sensitivity in Neural Network Machine Translation
下载PDF
导出
摘要 随着深度学习的发展神经网络机器翻译有了长足的进步。众所周知,神经机器翻译方法对句子长度比较敏感。为了充分利用海量平行语料,考虑平行语料句子长度信息,把原平行语料划分若干个模块,为每一个模块训练一个子模型,提出一种按句子长度融合策略的神经机器翻译方法。当训练结束时,通过句长边界划分后的模型融合与三特征(困惑度、句长比与分类器)融合排序方法得到译文。实验结果表明,提出的方法在三个不同测试集上英中任务中平均提高了1.2左右的BLEU点,维汉任务中提升了0.4至0.6的BLEU点。说明该方法具有一定的参考意义。 With the development of deep learning,neural network machine translation has made considerable progress.It is well known that neuro-machine translation is sensitive to sentence length.In order to make full use of the large number of parallel corpus,this paper divides the original parallel corpus into several modules,trains a sub-model for each module,and proposes a neuro-machine translation method based on sentence length fusion strategy.At the end of the training,the translations are obtained by model fusion and three-feature(confusion,sentence length ratio and classifier)fusion sorting methods after the division of sentence length boundaries.The experimental results show that the BLEU points are increased by about 1.2 in English and Chinese tasks on three different test sets and 0.4 to 0.6 in Uyghur tasks.This method has some reference value.
作者 阿里木·赛买提 斯拉吉艾合麦提·如则麦麦提 麦合甫热提 艾山·吾买尔 吾守尔·斯拉木 吐尔根·依不拉音 Alim Samat;Sirajahmat Ruzmamat;Maihefureti;Aishan Wumaier;Wushuer Silamu;Turgun Ebrayim(Laboratory of Multi-Language Information Technology,College of Information Science and Engineering,Xinjiang University,Urumqi 830046,China)
出处 《计算机工程与应用》 CSCD 北大核心 2022年第9期195-200,共6页 Computer Engineering and Applications
基金 中国新疆维吾尔自治区重点实验室开放基金(2016D03023,2018D04019) 国家自然科学基金(61662077,61762084) 国家语委科研项目(ZDI135-54)。
关键词 机器翻译 极端句长数据 困惑度 融合 深度学习 machine translation extreme sentence length data perplexity(PPL) ensemble deep learning
  • 相关文献

参考文献2

共引文献12

同被引文献12

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部