摘要
机器翻译质量评估(QE)是在不依赖参考译文的条件下,自动对机器翻译译文进行评估。当前人工标注数据稀缺,使得神经QE模型在自动检测译文错误方面还存在较大问题。为了更好地利用规模庞大但却缺少人工标注信息的平行语料,该文提出一种基于平行语料的翻译知识迁移方案。首先采用跨语言预训练模型XLM-R构建神经质量评估基线系统,在此基础上提出三种预训练策略增强XLM-R的双语语义关联能力。该文方法在WMT 2017和WMT 2019的英德翻译质量评估数据集上都达到了最高性能。
Quality Estimation(QE)of Machine Translation(MT)can automatically estimate the quality of MT outputs without references.Due to the lack of manual data,the current QE Systems with neural network architecture still have problems in automatically detecting translation errors.For the sake of utilizing the vast but unlabeled parallel data,this paper proposes a translation knowledge transfer method.First,the cross-lingual pre-trained model XLM-R is used to construct the neural quality estimation baseline system,then we propose three pre-training strategies to enhance the bilingual semantic connection ability of XLM-R.The proposed method in this paper has reached the new SOTA performance on both the WMT2017and WMT2019quality estimation data sets.
作者
叶恒
贡正仙
YE Heng;GONG Zhengxian(School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China)
出处
《中文信息学报》
CSCD
北大核心
2023年第3期79-88,共10页
Journal of Chinese Information Processing
基金
国家自然科学基金(61976148)
江苏高校优势学科建设工程资助项目
关键词
机器翻译质量评估
跨语言预训练模型
语义关联
预训练策略
quality estimation of machine translation
cross-lingual pretrained model
semantic connection
pretraining strategy