期刊文献+

基于句法差异的汉-越平行句对抽取

Chinese-Vietnamese parallel sentence pair extraction based on syntactic differences
下载PDF
导出
摘要 低资源环境下,受限于平行语料的规模和质量,神经机器翻译的效果并不理想.汉-越神经机器翻译作为典型的低资源型机器翻译,同样面临平行语料匮乏的问题.针对这一问题提出了基于句法差异的汉-越平行句对抽取方法.一方面,分析了汉语和越南语间的句法差异,通过词性标签对差异进行表述;另一方面,利用孪生结构的循环神经网络,在编码过程中融入句法差异信息,从句法规则角度更好的指导抽取过程.实验表明,基于汉越可比语料所提方法能够有效地抽取出高质量汉越平行句对. It has been shown that the performance of neural machine translation(NMT) drops starkly in low-resource conditions due to the scale and quality limit of parallel corpus. As a typical low-resource machine translation task, Chinese-Vietnamese neural machine translation also faces the same problem. This paper proposes a method of extraction of Chinese-Vietnamese parallel sentences based on syntactic differences. We first analyze the syntactic features between Chinese and Vietnamese, and express the differences by part-of-speech labels. Then we use the Siamese recurrent neural network to integrate syntactic features of the information into the coding process, and argue that this helps guide the extraction process. Experiments show that the proposed approach can effectively extract high-quality Chinese-Vietnamese parallel sentence pairs based on Chinese-Vietnamese comparable corpus.
作者 于志强 高明虎 陈宇星 YU Zhi-qiang;GAO Ming-hu;CHEN Yu-xing(Information and Network Center,Yunnan Minzu University,Kunming 650500,China)
出处 《云南民族大学学报(自然科学版)》 CAS 2020年第4期366-370,共5页 Journal of Yunnan Minzu University:Natural Sciences Edition
基金 国家自然科学基金(61866020) 云南省教育厅科学研究基金(2019J0674).
关键词 句法特征 平行句对抽取 孪生循环神经网络 汉-越机器翻译 syntactic feature parallel sentence pair extraction Siamese recurrent neural network Chinese-Vietnam machine translation
  • 相关文献

参考文献2

共引文献116

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部