The technological breakthroughs in generative artificial intelligence,represented by ChatGPT,have brought about significant social changes as well as new problems and challenges.Generative artificial intelligence has ...The technological breakthroughs in generative artificial intelligence,represented by ChatGPT,have brought about significant social changes as well as new problems and challenges.Generative artificial intelligence has inherent flaws such as language imbalance,algorithmic black box,and algorithmic bias,and at the same time,it has external risks such as algorithmic comfort zone,data pollution,algorithmic infringement,and inaccurate output.These problems lead to the difficulty in legislation for the governance of generative artificial intelligence.Taking the data contamination incident in Google Translate as an example,this article proposes that in the process of constructing machine translation ethics,the responsibility mechanism of generative artificial intelligence should be constructed around three elements:data processing,algorithmic optimisation,and ethical alignment.展开更多
Most State-Of-The-Art(SOTA) Neural Machine Translation(NMT) systems today achieve outstanding results based only on large parallel corpora.The large-scale parallel corpora for high-resource languages is easily obtaina...Most State-Of-The-Art(SOTA) Neural Machine Translation(NMT) systems today achieve outstanding results based only on large parallel corpora.The large-scale parallel corpora for high-resource languages is easily obtainable.However,the translation quality of NMT for morphologically rich languages is still unsatisfactory,mainly because of the data sparsity problem encountered in Low-Resource Languages(LRLs).In the low-resource NMT paradigm,Transfer Learning(TL) has been developed into one of the most efficient methods.It is difficult to train the model on high-resource languages to include the information in both parent and child models,as well as the initially trained model that only contains the lexicon features and word embeddings of the parent model instead of the child languages feature.In this work,we aim to address this issue by proposing the language-independent Hybrid Transfer Learning(HTL) method for LRLs by sharing lexicon embedding between parent and child languages without leveraging back translation or manually injecting noises.First,we train the High-Resource Languages(HRLs) as the parent model with its vocabularies.Then,we combine the parent and child language pairs using the oversampling method to train the hybrid model initialized by the previously parent model.Finally,we fine-tune the morphologically rich child model using a hybrid model.Besides,we explore some exciting discoveries on the original TL approach.Experimental results show that our model consistently outperforms five SOTA methods in two languages Azerbaijani(Az) and Uzbek(Uz).Meanwhile,our approach is practical and significantly better,achieving improvements of up to 4:94 and 4:84 BLEU points for low-resource child languages Az ! Zh and Uz ! Zh,respectively.展开更多
基金supported by Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies(Grant No.2022B1212010005)XJTLU Research Development Funding(Grant No.RDF-22-01-053).
文摘The technological breakthroughs in generative artificial intelligence,represented by ChatGPT,have brought about significant social changes as well as new problems and challenges.Generative artificial intelligence has inherent flaws such as language imbalance,algorithmic black box,and algorithmic bias,and at the same time,it has external risks such as algorithmic comfort zone,data pollution,algorithmic infringement,and inaccurate output.These problems lead to the difficulty in legislation for the governance of generative artificial intelligence.Taking the data contamination incident in Google Translate as an example,this article proposes that in the process of constructing machine translation ethics,the responsibility mechanism of generative artificial intelligence should be constructed around three elements:data processing,algorithmic optimisation,and ethical alignment.
基金supported by the National Key R&D Program of China (No. 2017YFB0202204)the National Natural Science Foundation of China (Nos. 61925601, 61761166008, and 61772302)+1 种基金Beijing Advanced Innovation Center for Language Resources (No. TYR17002)the NExT ++ project which supported by the National Research Foundation, Prime Ministers Office, Singapore under its IRC@Singapore Funding Initiative。
文摘Most State-Of-The-Art(SOTA) Neural Machine Translation(NMT) systems today achieve outstanding results based only on large parallel corpora.The large-scale parallel corpora for high-resource languages is easily obtainable.However,the translation quality of NMT for morphologically rich languages is still unsatisfactory,mainly because of the data sparsity problem encountered in Low-Resource Languages(LRLs).In the low-resource NMT paradigm,Transfer Learning(TL) has been developed into one of the most efficient methods.It is difficult to train the model on high-resource languages to include the information in both parent and child models,as well as the initially trained model that only contains the lexicon features and word embeddings of the parent model instead of the child languages feature.In this work,we aim to address this issue by proposing the language-independent Hybrid Transfer Learning(HTL) method for LRLs by sharing lexicon embedding between parent and child languages without leveraging back translation or manually injecting noises.First,we train the High-Resource Languages(HRLs) as the parent model with its vocabularies.Then,we combine the parent and child language pairs using the oversampling method to train the hybrid model initialized by the previously parent model.Finally,we fine-tune the morphologically rich child model using a hybrid model.Besides,we explore some exciting discoveries on the original TL approach.Experimental results show that our model consistently outperforms five SOTA methods in two languages Azerbaijani(Az) and Uzbek(Uz).Meanwhile,our approach is practical and significantly better,achieving improvements of up to 4:94 and 4:84 BLEU points for low-resource child languages Az ! Zh and Uz ! Zh,respectively.