Low-resource text plagiarism detection faces a significant challenge due to the limited availability of labeled data for training.This task requires the development of sophisticated algorithms capable of identifying s...Low-resource text plagiarism detection faces a significant challenge due to the limited availability of labeled data for training.This task requires the development of sophisticated algorithms capable of identifying similarities and differences in texts,particularly in the realm of semantic rewriting and translation-based plagiarism detection.In this paper,we present an enhanced attentive Siamese Long Short-Term Memory(LSTM)network designed for Tibetan-Chinese plagiarism detection.Our approach begins with the introduction of translation-based data augmentation,aimed at expanding the bilingual training dataset.Subsequently,we propose a pre-detection method leveraging abstract document vectors to enhance detection efficiency.Finally,we introduce an improved attentive Siamese LSTM network tailored for Tibetan-Chinese plagiarism detection.We conduct comprehensive experiments to showcase the effectiveness of our proposed plagiarism detection framework.展开更多
基金supported by the National Natural Science Foundation of China(No.62271456)the Open Projects Program of State Key Laboratory of Multimodal Artificial Intelligence Systems.
文摘Low-resource text plagiarism detection faces a significant challenge due to the limited availability of labeled data for training.This task requires the development of sophisticated algorithms capable of identifying similarities and differences in texts,particularly in the realm of semantic rewriting and translation-based plagiarism detection.In this paper,we present an enhanced attentive Siamese Long Short-Term Memory(LSTM)network designed for Tibetan-Chinese plagiarism detection.Our approach begins with the introduction of translation-based data augmentation,aimed at expanding the bilingual training dataset.Subsequently,we propose a pre-detection method leveraging abstract document vectors to enhance detection efficiency.Finally,we introduce an improved attentive Siamese LSTM network tailored for Tibetan-Chinese plagiarism detection.We conduct comprehensive experiments to showcase the effectiveness of our proposed plagiarism detection framework.