摘要
目前的双语词对齐模型主要依赖大量人工标注语料,不仅耗费时间成本并且人工标注质量不稳定,为了解决这一问题,提出一种基于双语句对齐语料构建双语词对齐神经网络模型的方法。使用GIZA++进行双语词对齐,设计标注方案,生成双语词对齐语料,作为神经网络初始训练输入;为了充分挖掘句子间潜在的语言特征,提出一种在神经网络的编码层融入双语线性句法信息的词对齐方法。实验基于英中专利与标准句对齐语料进行,神经网络对齐的准确率达到89.05%。
The current bilingual word alignment model mainly relies on manual tagging of bilingual word alignment corpus,which costs a lot of manpower and makes the quality of manual labeling unstable.In order to solve this problem,this paper proposes a method for modeling the neural network for term extraction based on bilingual sentence alignment corpus.GIZA++was used to align bilingual words and design annotation schemes,thus generating the tagged corpus for bilingual word alignment as the initial training input of neural network.In order to fully explore the potential language features between sentences,from the perspective of deep learning,a word alignment method integrating bilingual linear syntax tree structure into the coding layer of neural network was proposed.The experiment was carried out based on English-Chinese patent and standard sentence alignment corpus,with an accuracy of 89.05%.
作者
尹宝生
张斌斌
李绍鸣
Yin Baosheng;Zhang Binbin;Li Shaoming(Shenyang Aerospace University,Shenyang 110136,Liaoning,China;Human-Computer Intelligence Research Center,Shenyang 110136,Liaoning,China)
出处
《计算机应用与软件》
北大核心
2023年第9期278-282,319,共6页
Computer Applications and Software
基金
国防技术基础项目(JSQB2017206C002)。
关键词
线性句法
词对齐
神经网络
linear syntactic
Word alignment
Neural network