期刊文献+

融入线性句法信息的神经网络双语词对齐算法

A NEURAL NETWORK BILINGUAL WORD ALIGNMENT ALGORITHM INCORPORATING LINEAR SYNTACTIC INFORMATION
下载PDF
导出
摘要 目前的双语词对齐模型主要依赖大量人工标注语料,不仅耗费时间成本并且人工标注质量不稳定,为了解决这一问题,提出一种基于双语句对齐语料构建双语词对齐神经网络模型的方法。使用GIZA++进行双语词对齐,设计标注方案,生成双语词对齐语料,作为神经网络初始训练输入;为了充分挖掘句子间潜在的语言特征,提出一种在神经网络的编码层融入双语线性句法信息的词对齐方法。实验基于英中专利与标准句对齐语料进行,神经网络对齐的准确率达到89.05%。 The current bilingual word alignment model mainly relies on manual tagging of bilingual word alignment corpus,which costs a lot of manpower and makes the quality of manual labeling unstable.In order to solve this problem,this paper proposes a method for modeling the neural network for term extraction based on bilingual sentence alignment corpus.GIZA++was used to align bilingual words and design annotation schemes,thus generating the tagged corpus for bilingual word alignment as the initial training input of neural network.In order to fully explore the potential language features between sentences,from the perspective of deep learning,a word alignment method integrating bilingual linear syntax tree structure into the coding layer of neural network was proposed.The experiment was carried out based on English-Chinese patent and standard sentence alignment corpus,with an accuracy of 89.05%.
作者 尹宝生 张斌斌 李绍鸣 Yin Baosheng;Zhang Binbin;Li Shaoming(Shenyang Aerospace University,Shenyang 110136,Liaoning,China;Human-Computer Intelligence Research Center,Shenyang 110136,Liaoning,China)
出处 《计算机应用与软件》 北大核心 2023年第9期278-282,319,共6页 Computer Applications and Software
基金 国防技术基础项目(JSQB2017206C002)。
关键词 线性句法 词对齐 神经网络 linear syntactic Word alignment Neural network
  • 相关文献

参考文献6

二级参考文献52

  • 1张孝飞,陈肇雄,黄河燕,王建德.基于锚点词对的双语词对齐算法[J].小型微型计算机系统,2006,27(2):330-334. 被引量:10
  • 2吴宏林,刘绍明,于戈.基于加权二部图的汉日词对齐[J].中文信息学报,2007,21(5):101-106. 被引量:7
  • 3LE H P, HOT V. A maximum entropy approach to sentence boundary detection of Vietnamese texts [ C ]//IEEE International Conference on Research, Innovation and Vision for the Future-RIVF 2008. New York: IEEE, 2008 : 1-6.
  • 4HUYIN N T M, ROUSSANALY A, VINH H T. A hybrid approach to word segmentation of Vietnamese texts[J]. Language and Automata Theory and Applications, 2008:240-249.
  • 5越南语词法分析系统[EB/OL].[2014-11-12].http://www.10ria.fr/-lehong/tools/vn-Tokenizer.php.
  • 6BROWN P F, P1ETRA V J D, PIETRA S A D, et al. The mathematics of statistical machine translation:parameter estimation [ J ]. Computational Linguistics, 1993, 19 (2) : 263-311.
  • 7Franz Josef Och, Hermann Ney. A systematic comparison of various statistical alignment models [ J ]. Computational Linguis- tics, 2003, 29(1):19-51.
  • 8BLUNSOM P, COHN T. Discriminative word alignment with conditional random fields [ C ]//Proceedings of the 21 st Interna- tional Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Philadelphia:Association for Computational Linguistics, 2006:65-72.
  • 9LIU Y, LIU Q, LIN S. Discriminative word alignment by linear modeling [ J ]. Computational Linguistics, 2010, 36 (3) :303- 339.
  • 10HINTON G E, OSINDERO S, TEH Y W. A fast learning algorithm for deep belief nets[ J]. Neural Computation, 2006, 18 (7) : 1527-1554.

共引文献26

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部