期刊文献+

多语言专利机器翻译平行语料构建方法研究 被引量:1

Research on Construction Methods of Machine Translation-Oriented Multilingual Patent Corpus
下载PDF
导出
摘要 神经网络机器翻译技术本质上是数据驱动型技术,大规模、高质量的语料资源是构建高性能多语种神经网络机器翻译系统的基础条件,语料资源建设至关重要。本文基于现有专利机器翻译引擎训练语料扩充及特定语言方向专利语料资源建设的需求,对基于标准BLEU4算法、基于伪数据构建及基于同族专利数据进行多语言专利平行语料构建的方法分别进行研究,并分析总结不同的专利平行语料构建方法的优劣及各自的适用场景,以期探索多语言专利平行语料构建的可靠方案,有效扩充现有专利语料资源。 Neural machine translation(NMT) technology is data-driven technology intrinsically and the foundation of a high performance multilingual neural machine translation system is large-scale and highquality corpus resources. Therefore, the construction of corpus resources is crucial. Based on the shortage of existing patent corpus resources and the needs of patent corpus resource construction, this paper conducts a study on patent parallel corpus construction methods based on a standard BLEU4 algorithm, based on pseudo-data construction and based on family patents, and analyzes and summarizes the advantages and disadvantages of said patent parallel corpus construction methods and their respective applicable scenarios,so as to explore reliable construction schemes of a multilingual parallel corpus and thus to achieve the effective expansion of the current multilingual parallel corpus resources.
作者 曹竟成 邬小倩 王倩 孙小宇 邓汇娟 CAO Jingcheng;WU Xiaoqian;WANG Qian;SUN Xiaoyu;DENG Huijuan(China Patent Information Center,Beijing 100044)
出处 《中国发明与专利》 2022年第6期70-75,80,共7页 China Invention & Patent
关键词 多语言平行语料构建 中间语言匹配 标准BLEU4算法 伪数据构建 同族专利 multilingual parallel corpus construction intermediate language-based matching standard BLEU4 algorithm pseudo-data construction family patents
  • 相关文献

参考文献1

共引文献9

同被引文献5

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部