Thanks to the strong representation capability of pre-trained language models,supervised machine translation models have achieved outstanding performance.However,the performances of these models drop sharply when the ...Thanks to the strong representation capability of pre-trained language models,supervised machine translation models have achieved outstanding performance.However,the performances of these models drop sharply when the scale of the parallel training corpus is limited.Considering the pre-trained language model has a strong ability for monolingual representation,it is the key challenge for machine translation to construct the in-depth relationship between the source and target language by injecting the lexical and syntactic information into pre-trained language models.To alleviate the dependence on the parallel corpus,we propose a Linguistics Knowledge-Driven MultiTask(LKMT)approach to inject part-of-speech and syntactic knowledge into pre-trained models,thus enhancing the machine translation performance.On the one hand,we integrate part-of-speech and dependency labels into the embedding layer and exploit large-scale monolingual corpus to update all parameters of pre-trained language models,thus ensuring the updated language model contains potential lexical and syntactic information.On the other hand,we leverage an extra self-attention layer to explicitly inject linguistic knowledge into the pre-trained language model-enhanced machine translation model.Experiments on the benchmark dataset show that our proposed LKMT approach improves the Urdu-English translation accuracy by 1.97 points and the English-Urdu translation accuracy by 2.42 points,highlighting the effectiveness of our LKMT framework.Detailed ablation experiments confirm the positive impact of part-of-speech and dependency parsing on machine translation.展开更多
Entity alignment(EA)is an important technique aiming to find the same real entity between two different source knowledge graphs(KGs).Current methods typically learn the embedding of entities for EA from the structure ...Entity alignment(EA)is an important technique aiming to find the same real entity between two different source knowledge graphs(KGs).Current methods typically learn the embedding of entities for EA from the structure of KGs for EA.Most EA models are designed for rich-resource languages,requiring sufficient resources such as a parallel corpus and pre-trained language models.However,low-resource language KGs have received less attention,and current models demonstrate poor performance on those low-resource KGs.Recently,researchers have fused relation information and attributes for entity representations to enhance the entity alignment performance,but the relation semantics are often ignored.To address these issues,we propose a novel Semantic-aware Graph Neural Network(SGNN)for entity alignment.First,we generate pseudo sentences according to the relation triples and produce representations using pre-trained models.Second,our approach explores semantic information from the connected relations by a graph neural network.Our model captures expanded feature information from KGs.Experimental results using three low-resource languages demonstrate that our proposed SGNN approach out performs better than state-of-the-art alignment methods on three proposed datasets and three public datasets.展开更多
基金supported by the National Natural Science Foundation of China under Grant(61732005,61972186)Yunnan Provincial Major Science and Technology Special Plan Projects(Nos.202103AA080015,202203AA080004).
文摘Thanks to the strong representation capability of pre-trained language models,supervised machine translation models have achieved outstanding performance.However,the performances of these models drop sharply when the scale of the parallel training corpus is limited.Considering the pre-trained language model has a strong ability for monolingual representation,it is the key challenge for machine translation to construct the in-depth relationship between the source and target language by injecting the lexical and syntactic information into pre-trained language models.To alleviate the dependence on the parallel corpus,we propose a Linguistics Knowledge-Driven MultiTask(LKMT)approach to inject part-of-speech and syntactic knowledge into pre-trained models,thus enhancing the machine translation performance.On the one hand,we integrate part-of-speech and dependency labels into the embedding layer and exploit large-scale monolingual corpus to update all parameters of pre-trained language models,thus ensuring the updated language model contains potential lexical and syntactic information.On the other hand,we leverage an extra self-attention layer to explicitly inject linguistic knowledge into the pre-trained language model-enhanced machine translation model.Experiments on the benchmark dataset show that our proposed LKMT approach improves the Urdu-English translation accuracy by 1.97 points and the English-Urdu translation accuracy by 2.42 points,highlighting the effectiveness of our LKMT framework.Detailed ablation experiments confirm the positive impact of part-of-speech and dependency parsing on machine translation.
基金supported by National Natural Science Foundation of China(611750 68,61472168,61163004)Natural Science Foundation of Yunnan Province(2013FA130)Talent Promotion Project of Ministry of Science and Technology(2014HE001)
基金National Natural Science Foundation of China(Nos.U21B2027,61972186,61732005)Major Science and Technology Projects of Yunnan Province(Nos.202202AD080003,202203AA080004).
文摘Entity alignment(EA)is an important technique aiming to find the same real entity between two different source knowledge graphs(KGs).Current methods typically learn the embedding of entities for EA from the structure of KGs for EA.Most EA models are designed for rich-resource languages,requiring sufficient resources such as a parallel corpus and pre-trained language models.However,low-resource language KGs have received less attention,and current models demonstrate poor performance on those low-resource KGs.Recently,researchers have fused relation information and attributes for entity representations to enhance the entity alignment performance,but the relation semantics are often ignored.To address these issues,we propose a novel Semantic-aware Graph Neural Network(SGNN)for entity alignment.First,we generate pseudo sentences according to the relation triples and produce representations using pre-trained models.Second,our approach explores semantic information from the connected relations by a graph neural network.Our model captures expanded feature information from KGs.Experimental results using three low-resource languages demonstrate that our proposed SGNN approach out performs better than state-of-the-art alignment methods on three proposed datasets and three public datasets.