期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
Enriching the Transfer Learning with Pre-Trained Lexicon Embedding for Low-Resource Neural Machine Translation 被引量:5
1
作者 Mieradilijiang Maimaiti Yang Liu +1 位作者 Huanbo Luan Maosong Sun 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2022年第1期150-163,共14页
Most State-Of-The-Art(SOTA) Neural Machine Translation(NMT) systems today achieve outstanding results based only on large parallel corpora.The large-scale parallel corpora for high-resource languages is easily obtaina... Most State-Of-The-Art(SOTA) Neural Machine Translation(NMT) systems today achieve outstanding results based only on large parallel corpora.The large-scale parallel corpora for high-resource languages is easily obtainable.However,the translation quality of NMT for morphologically rich languages is still unsatisfactory,mainly because of the data sparsity problem encountered in Low-Resource Languages(LRLs).In the low-resource NMT paradigm,Transfer Learning(TL) has been developed into one of the most efficient methods.It is difficult to train the model on high-resource languages to include the information in both parent and child models,as well as the initially trained model that only contains the lexicon features and word embeddings of the parent model instead of the child languages feature.In this work,we aim to address this issue by proposing the language-independent Hybrid Transfer Learning(HTL) method for LRLs by sharing lexicon embedding between parent and child languages without leveraging back translation or manually injecting noises.First,we train the High-Resource Languages(HRLs) as the parent model with its vocabularies.Then,we combine the parent and child language pairs using the oversampling method to train the hybrid model initialized by the previously parent model.Finally,we fine-tune the morphologically rich child model using a hybrid model.Besides,we explore some exciting discoveries on the original TL approach.Experimental results show that our model consistently outperforms five SOTA methods in two languages Azerbaijani(Az) and Uzbek(Uz).Meanwhile,our approach is practical and significantly better,achieving improvements of up to 4:94 and 4:84 BLEU points for low-resource child languages Az ! Zh and Uz ! Zh,respectively. 展开更多
关键词 artificial intelligence natural language processing neural network machine translation low-resource languages transfer learning
原文传递
Controllable data synthesis method for grammatical error correction 被引量:1
2
作者 Liner Yang Chengcheng Wang +2 位作者 Yun Chen Yongping Du Erhong Yang 《Frontiers of Computer Science》 SCIE EI CSCD 2022年第4期69-78,共10页
Due to the lack of parallel data in current grammatical error correction(GEC)task,models based on sequence to sequence framework cannot be adequately trained to obtain higher performance.We propose two data synthesis ... Due to the lack of parallel data in current grammatical error correction(GEC)task,models based on sequence to sequence framework cannot be adequately trained to obtain higher performance.We propose two data synthesis methods which can control the error rate and the ratio of error types on synthetic data.The first approach is to corrupt each word in the monolingual corpus with a fixed probability,including replacement,insertion and deletion.Another approach is to train error generation models and further filtering the decoding results of the models.The experiments on different synthetic data show that the error rate is 40%and that the ratio of error types is the same can improve the model performance better.Finally,we synthesize about 100 million data and achieve comparable performance as the state of the art,which uses twice as much data as we use. 展开更多
关键词 grammatical error correction sequence to sequence data synthesis
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部