摘要
近年以来,神经网络机器翻译作为新兴的翻译技术,取得了极大的进步。翻译的译文不仅更加准确也更为流畅。但神经网络翻译同时还有许多问题需要改进。本文旨在以日中神经网络机器翻译为实例,探讨词汇层面的问题和成因,并提出相应的模型改进方法。受限于模型的词表大小和语料资源的领域不匹配等原因,译文中存在未知词和词语的错翻漏翻等问题。因此,本文根据这些原因提出了使用subword,替换低频词,利用外部词典,采用领域自适应训练模型等多个改进方案。使用subword或者利用外部词典,可以克服词表过小的问题。替换低频词可以降低低频词对模型的负影响。领域自适应可以提高模型对特定领域文本的表现。实验结果表明本文提出的模型改进方案相较于一般的神经网络翻译模型,能很好地减少词汇翻译问题的出现次数,从而提高译文的翻译质量。
In recent years, Neural Network Machine Translation (NMT) has made great progress as a new translation technology. Its translation results are not only more accurate but also more fluid. But at the same time, NMT also has many problems that need to be solved. The purpose of this article is to explore problems of vocabulary and their causes, and propose solutions for tuning model of Japanese-Chinese NMT. The limitation of the size of vocabulary and the domain mismatch of corpus could lead some problems such as unknown words and mistranslated words. Therefore, this article proposes several solutions like using subword, replacing low-frequency words, using external dictionaries, and using domain adaptation. Using subword or using external dictionary can overcome the problem caused by small size of vocabulary. Replacing low-frequency words can reduce the negative influence of low-frequency words. Domain adaptation can improve the performance on translating specific domain text. The experimental results showed that compared with the general NMT model, the approaches of tuning model proposed in this article can reduce the number of vocabulary translation problems and improve the translation quality.
出处
《计算机科学与应用》
2020年第3期387-397,共11页
Computer Science and Application