摘要
传统的神经机器翻译模型是一个黑盒子,并不能有效把术语信息添加进去。而利用用户提供的术语词典来联合训练神经机器翻译模型具有实际意义。据此,该文提出融入术语信息的新能源领域Transformer专利机器翻译模型,使用将源端术语替换为目标端术语以及在源端术语后增添目标端术语两种手段进行术语信息融合,实验表明,在构建的新能源领域专利汉英平行语料库和术语库上,提出的专利翻译模型优于Transformer基准模型。并评测了其在人工构建的数据集、中国专利信息中心的数据集及世界知识产权局的数据集上的翻译效果。
The traditional neural machine translation is a black box and cannot effectively add terminology information.It is of practical significance to use term provided by the user to jointly train the neural machine translation model.Accordingly,we propose a new energy transformer patent machine translation model with terminology information incorporated.The source term is replaced with the target term and the target term is added after the source term to fusing the terminology information.Experimentsal results on the Chinese-English task with patent termbase in the field of new energy show that the proposed patent translation model is better than the Transformer baseline model,as well as the translation quality analysis on three datasets.
作者
游新冬
杨海翔
陈海涛
孙甜
吕学强
YOU Xindong;YANG Haixiang;CHEN Haitao;SUN Tian;LV Xueqiang(Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science and Technology University,Beijing 100101,China;School of Foreign Languages,Beijing Information Science and Technology University,Beijing 100192,China)
出处
《中文信息学报》
CSCD
北大核心
2021年第12期76-83,93,共9页
Journal of Chinese Information Processing
基金
北京市自然科学基金(4212020)
国家自然科学基金(62171043)
北京市教委科研计划项目(KM20211232001)
北京信息科技大学“勤信人才”培育计划项目(QXTCPB201908)