Cross-lingual summarization(CLS)is the task of generating a summary in a target language from a document in a source language.Recently,end-to-end CLS models have achieved impressive results using large-scale,high-qual...Cross-lingual summarization(CLS)is the task of generating a summary in a target language from a document in a source language.Recently,end-to-end CLS models have achieved impressive results using large-scale,high-quality datasets typically constructed by translating monolingual summary corpora into CLS corpora.However,due to the limited performance of low-resource language translation models,translation noise can seriously degrade the performance of these models.In this paper,we propose a fine-grained reinforcement learning approach to address low-resource CLS based on noisy data.We introduce the source language summary as a gold signal to alleviate the impact of the translated noisy target summary.Specifically,we design a reinforcement reward by calculating the word correlation and word missing degree between the source language summary and the generated target language summary,and combine it with cross-entropy loss to optimize the CLS model.To validate the performance of our proposed model,we construct Chinese-Vietnamese and Vietnamese-Chinese CLS datasets.Experimental results show that our proposed model outperforms the baselines in terms of both the ROUGE score and BERTScore.展开更多
基金Project supported by the National Natural Science Foundation of China(Nos.U21B2027,62266027,61972186,62241604)the Yunnan Provincial Major Science and Technology Special Plan Projects,China(Nos.202302AD080003,202103AA080015,and 202202AD080003)+1 种基金the General Projects of Basic Research in Yunnan Province,China(Nos.202301AT070471 and 202301AT070393)the Kunming University of Science and Technology“Double First-Class”Joint Project,China(No.202201BE070001-021)。
文摘Cross-lingual summarization(CLS)is the task of generating a summary in a target language from a document in a source language.Recently,end-to-end CLS models have achieved impressive results using large-scale,high-quality datasets typically constructed by translating monolingual summary corpora into CLS corpora.However,due to the limited performance of low-resource language translation models,translation noise can seriously degrade the performance of these models.In this paper,we propose a fine-grained reinforcement learning approach to address low-resource CLS based on noisy data.We introduce the source language summary as a gold signal to alleviate the impact of the translated noisy target summary.Specifically,we design a reinforcement reward by calculating the word correlation and word missing degree between the source language summary and the generated target language summary,and combine it with cross-entropy loss to optimize the CLS model.To validate the performance of our proposed model,we construct Chinese-Vietnamese and Vietnamese-Chinese CLS datasets.Experimental results show that our proposed model outperforms the baselines in terms of both the ROUGE score and BERTScore.