期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
Exploring Attentive Siamese LSTM for Low-Resource Text Plagiarism Detection
1
作者 Wei Bao Jian Dong +2 位作者 Yang Xu Yuanyuan Yang Xiaoke Qi 《Data Intelligence》 EI 2024年第2期488-503,共16页
Low-resource text plagiarism detection faces a significant challenge due to the limited availability of labeled data for training.This task requires the development of sophisticated algorithms capable of identifying s... Low-resource text plagiarism detection faces a significant challenge due to the limited availability of labeled data for training.This task requires the development of sophisticated algorithms capable of identifying similarities and differences in texts,particularly in the realm of semantic rewriting and translation-based plagiarism detection.In this paper,we present an enhanced attentive Siamese Long Short-Term Memory(LSTM)network designed for Tibetan-Chinese plagiarism detection.Our approach begins with the introduction of translation-based data augmentation,aimed at expanding the bilingual training dataset.Subsequently,we propose a pre-detection method leveraging abstract document vectors to enhance detection efficiency.Finally,we introduce an improved attentive Siamese LSTM network tailored for Tibetan-Chinese plagiarism detection.We conduct comprehensive experiments to showcase the effectiveness of our proposed plagiarism detection framework. 展开更多
关键词 Text plagiarism detection Low resource Siamese Long Short-Term Memory Tibetan-Chinese
原文传递
Idea plagiarism detection with recurrent neural networks and vector space model 被引量:1
2
作者 Azra Nazir Roohie Naaz Mir Shaima Qureshi 《International Journal of Intelligent Computing and Cybernetics》 EI 2021年第3期321-332,共12页
Purpose-Natural languages have a fundamental quality of suppleness that makes it possible to present a single idea in plenty of different ways.This feature is often exploited in the academic world,leading to the theft... Purpose-Natural languages have a fundamental quality of suppleness that makes it possible to present a single idea in plenty of different ways.This feature is often exploited in the academic world,leading to the theft of work referred to as plagiarism.Many approaches have been put forward to detect such cases based on various text features and grammatical structures of languages.However,there is a huge scope of improvement for detecting intelligent plagiarism.Design/methodology/approach-To realize this,the paper introduces a hybrid model to detect intelligent plagiarism by breaking the entire process into three stages:(1)clustering,(2)vector formulation in each cluster based on semantic roles,normalization and similarity index calculation and(3)Summary generation using encoder-decoder.An effective weighing scheme has been introduced to select terms used to build vectors based on K-means,which is calculated on the synonym set for the said term.If the value calculated in the last stage lies above a predefined threshold,only then the next semantic argument is analyzed.When the similarity score for two documents is beyond the threshold,a short summary for plagiarized documents is created.Findings-Experimental results show that this method is able to detect connotation and concealment used in idea plagiarism besides detecting literal plagiarism.Originality/value-The proposed model can help academics stay updated by providing summaries of relevant articles.It would eliminate the practice of plagiarism infesting the academic community at an unprecedented pace.The model will also accelerate the process of reviewing academic documents,aiding in the speedy publishing of research articles. 展开更多
关键词 Natural language processing Vector space model Recurrent neural networks plagiarism detection
原文传递
Research on MLChecker Plagiarism Detection System
3
作者 Haihao Yu Chengzhe Huang +3 位作者 Leilei Kong Xu Sun Haoliang Qi Zhongyuan Han 《国际计算机前沿大会会议论文集》 2020年第2期176-181,共6页
Plagiarism detection system plays an essential role in education quality improvement by helping teachers to detect plagiarism.Using a number of measures customized to determine occurrences of plagiarism is the most co... Plagiarism detection system plays an essential role in education quality improvement by helping teachers to detect plagiarism.Using a number of measures customized to determine occurrences of plagiarism is the most common approach for plagiarism detection tool.It is simple and effective,while it lacks flexibility when applied in more complicated situations.This paper proposes the MLChecker,a smart plagiarism detection system,to provide more flexible detection tactics.An automatic plagiarism dataset construction method was exploited in MLChecker to dynamically update the plagiarism detection algorithms according to different detection tasks.And the full-process quality management functions were also provided by MLChecker.The result shows that the detection accuracy is raised by 56%.Compared with traditional plagiarism detection tools,MLChecker is with higher accuracy and efficiency. 展开更多
关键词 plagiarism plagiarism detection system plagiarism dataset MLChecker
原文传递
A machine learning approach to query generation in plagiarism source retrieval
4
作者 Lei-lei KONG Zhi-mao LU +1 位作者 Hao-liang QI Zhong-yuan HAN 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2017年第10期1556-1572,共17页
Plagiarism source retrieval is the core task of plagiarism detection. It has become the standard for plagiarism detection to use the queries extracted from suspicious documents to retrieve the plagiarism sources. Gene... Plagiarism source retrieval is the core task of plagiarism detection. It has become the standard for plagiarism detection to use the queries extracted from suspicious documents to retrieve the plagiarism sources. Generating queries from a suspicious document is one of the most important steps in plagiarism source retrieval. Heuristic-based query generation methods are widely used in the current research. Each heuristic-based method has its own advantages, and no one statistically outperforms the others on all suspicious document segments when generating queries for source retrieval. Further improvements on heuristic methods for source retrieval rely mainly on the experience of experts. This leads to difficulties in putting forward new heuristic methods that can overcome the shortcomings of the existing ones. This paper paves the way for a new statistical machine learning approach to select the best queries from the candidates. The statistical machine learning approach to query generation for source retrieval is formulated as a ranking framework. Specifically, it aims to achieve the optimal source retrieval performance for each suspicious document segment. The proposed method exploits learning to rank to generate queries from the candidates. To our knowledge, our work is the first research to apply machine learning methods to resolve the problem of query generation for source retrieval. To solve the essential problem of an absence of training data for learning to rank, the building of training samples for source retrieval is also conducted. We rigorously evaluate various aspects of the proposed method on the publicly available PAN source retrieval corpus. With respect to the established baselines, the experimental results show that applying our proposed query generation method based on machine learning yields statistically significant improvements over baselines in source retrieval effectiveness. 展开更多
关键词 plagiarism detection Source retrieval Query generation Machine learning Learning to rank
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部