摘要
神经机器翻译技术是目前机器翻译应用中取得效果最好的方法。将外部语言学知识如单词词性、依存句法标签引入神经机器翻译系统以提高翻译性能已经被很多学者证明是一种行之有效的途径。相较于其他表音文字,汉字是一种形声字,其构造方法具有一半表音、一半表意的特殊结构,这种特殊的构造法使得汉字含有丰富的语义、语音和句法信息。该文在Marta R等工作的基础上,提出了一种新的将字形特征融入端到端模型的方法,并将之应用于中文到英文的翻译上。与基准系统相比,该方法在NIST评测集上获得平均1.1个点的显著提升,有效地证明了汉字字形特征可以对神经机器翻译模型起到促进作用。
The technology of neural machine translation is currently the best way to achieve the state-of-the-art results in application. Introducing external linguistic knowledge such as part-of-speech and dependency syntax tags into the neural machine translation system to improve translation performance has been proved effective. Compared with other phonetic characters,Chinese is a kind of semantic-phonetic compound character,which not only has the function of pronunciation but also contains semantic information. We propose a new method of incorporating glyph features into the end-to-end model based on the work of Marta R,et al,applying it to Chinese-English translation. Compared with the benchmark system, this method achieves a significant increase of 1.1 points in average on the NIST evaluation set, demonstrating that the glyph features of Chinese character can improve the neural machine translation model effectively.
作者
蔡子龙
熊德意
CAI Zilong;XIONG Deyi(School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006 ,China)
出处
《中文信息学报》
CSCD
北大核心
2019年第5期75-81,共7页
Journal of Chinese Information Processing
基金
国家自然科学基金(61622209
61861130364)