Prior studies have demonstrated that deep learning-based approaches can enhance the performance of source code vulnerability detection by training neural networks to learn vulnerability patterns in code representation...Prior studies have demonstrated that deep learning-based approaches can enhance the performance of source code vulnerability detection by training neural networks to learn vulnerability patterns in code representations.However,due to limitations in code representation and neural network design,the validity and practicality of the model still need to be improved.Additionally,due to differences in programming languages,most methods lack cross-language detection generality.To address these issues,in this paper,we analyze the shortcomings of previous code representations and neural networks.We propose a novel hierarchical code representation that combines Concrete Syntax Trees(CST)with Program Dependence Graphs(PDG).Furthermore,we introduce a Tree-Graph-Gated-Attention(TGGA)network based on gated recurrent units and attention mechanisms to build a Hierarchical Code Representation learning-based Vulnerability Detection(HCRVD)system.This system enables cross-language vulnerability detection at the function-level.The experiments show that HCRVD surpasses many competitors in vulnerability detection capabilities.It benefits from the hierarchical code representation learning method,and outperforms baseline in cross-language vulnerability detection by 9.772%and 11.819%in the C/C++and Java datasets,respectively.Moreover,HCRVD has certain ability to detect vulnerabilities in unknown programming languages and is useful in real open-source projects.HCRVD shows good validity,generality and practicality.展开更多
The high-obfuscation plagiarism detection in big data environment,such as the paraphrasing and cross-language plagiarism, is often difficult for anti-plagiarism system because the plagiarism skills are becoming more a...The high-obfuscation plagiarism detection in big data environment,such as the paraphrasing and cross-language plagiarism, is often difficult for anti-plagiarism system because the plagiarism skills are becoming more and more complex. This paper proposes HawkEyes, a plagiarism detection system implemented based on the source retrieval and text alignment algorithms which developed for the international competition on plagiarism detection organized by CLEF. The text alignment algorism in HawkEyes gained the first place in PAN@CLEF2012. In the demonstration, we will present our system implemented on PAN@CLEF2014 training data corpus.展开更多
The problem of high similarity in homework has troubled teachers with time. Previous plagiarism detection systems are mainly realized by string matching which has a limitation, i.e., image homework cannot be detected....The problem of high similarity in homework has troubled teachers with time. Previous plagiarism detection systems are mainly realized by string matching which has a limitation, i.e., image homework cannot be detected. To this issue, we propose a new method of plagiarism detection in homework. First,we get fingerprint features of image homework by converting text homework into images. Then, we use image hashing algorithm and hamming distance to calculate the similarity of these features. Finally, we perform the empirical study on course of Computer Network Experiment, the test shows that our method not only reliably keeps the detection speedily, but also consistently ensures precision and false positive rate.展开更多
Low-resource text plagiarism detection faces a significant challenge due to the limited availability of labeled data for training.This task requires the development of sophisticated algorithms capable of identifying s...Low-resource text plagiarism detection faces a significant challenge due to the limited availability of labeled data for training.This task requires the development of sophisticated algorithms capable of identifying similarities and differences in texts,particularly in the realm of semantic rewriting and translation-based plagiarism detection.In this paper,we present an enhanced attentive Siamese Long Short-Term Memory(LSTM)network designed for Tibetan-Chinese plagiarism detection.Our approach begins with the introduction of translation-based data augmentation,aimed at expanding the bilingual training dataset.Subsequently,we propose a pre-detection method leveraging abstract document vectors to enhance detection efficiency.Finally,we introduce an improved attentive Siamese LSTM network tailored for Tibetan-Chinese plagiarism detection.We conduct comprehensive experiments to showcase the effectiveness of our proposed plagiarism detection framework.展开更多
Purpose-Natural languages have a fundamental quality of suppleness that makes it possible to present a single idea in plenty of different ways.This feature is often exploited in the academic world,leading to the theft...Purpose-Natural languages have a fundamental quality of suppleness that makes it possible to present a single idea in plenty of different ways.This feature is often exploited in the academic world,leading to the theft of work referred to as plagiarism.Many approaches have been put forward to detect such cases based on various text features and grammatical structures of languages.However,there is a huge scope of improvement for detecting intelligent plagiarism.Design/methodology/approach-To realize this,the paper introduces a hybrid model to detect intelligent plagiarism by breaking the entire process into three stages:(1)clustering,(2)vector formulation in each cluster based on semantic roles,normalization and similarity index calculation and(3)Summary generation using encoder-decoder.An effective weighing scheme has been introduced to select terms used to build vectors based on K-means,which is calculated on the synonym set for the said term.If the value calculated in the last stage lies above a predefined threshold,only then the next semantic argument is analyzed.When the similarity score for two documents is beyond the threshold,a short summary for plagiarized documents is created.Findings-Experimental results show that this method is able to detect connotation and concealment used in idea plagiarism besides detecting literal plagiarism.Originality/value-The proposed model can help academics stay updated by providing summaries of relevant articles.It would eliminate the practice of plagiarism infesting the academic community at an unprecedented pace.The model will also accelerate the process of reviewing academic documents,aiding in the speedy publishing of research articles.展开更多
Plagiarism detection system plays an essential role in education quality improvement by helping teachers to detect plagiarism.Using a number of measures customized to determine occurrences of plagiarism is the most co...Plagiarism detection system plays an essential role in education quality improvement by helping teachers to detect plagiarism.Using a number of measures customized to determine occurrences of plagiarism is the most common approach for plagiarism detection tool.It is simple and effective,while it lacks flexibility when applied in more complicated situations.This paper proposes the MLChecker,a smart plagiarism detection system,to provide more flexible detection tactics.An automatic plagiarism dataset construction method was exploited in MLChecker to dynamically update the plagiarism detection algorithms according to different detection tasks.And the full-process quality management functions were also provided by MLChecker.The result shows that the detection accuracy is raised by 56%.Compared with traditional plagiarism detection tools,MLChecker is with higher accuracy and efficiency.展开更多
基金funded by the Major Science and Technology Projects in Henan Province,China,Grant No.221100210600.
文摘Prior studies have demonstrated that deep learning-based approaches can enhance the performance of source code vulnerability detection by training neural networks to learn vulnerability patterns in code representations.However,due to limitations in code representation and neural network design,the validity and practicality of the model still need to be improved.Additionally,due to differences in programming languages,most methods lack cross-language detection generality.To address these issues,in this paper,we analyze the shortcomings of previous code representations and neural networks.We propose a novel hierarchical code representation that combines Concrete Syntax Trees(CST)with Program Dependence Graphs(PDG).Furthermore,we introduce a Tree-Graph-Gated-Attention(TGGA)network based on gated recurrent units and attention mechanisms to build a Hierarchical Code Representation learning-based Vulnerability Detection(HCRVD)system.This system enables cross-language vulnerability detection at the function-level.The experiments show that HCRVD surpasses many competitors in vulnerability detection capabilities.It benefits from the hierarchical code representation learning method,and outperforms baseline in cross-language vulnerability detection by 9.772%and 11.819%in the C/C++and Java datasets,respectively.Moreover,HCRVD has certain ability to detect vulnerabilities in unknown programming languages and is useful in real open-source projects.HCRVD shows good validity,generality and practicality.
文摘The high-obfuscation plagiarism detection in big data environment,such as the paraphrasing and cross-language plagiarism, is often difficult for anti-plagiarism system because the plagiarism skills are becoming more and more complex. This paper proposes HawkEyes, a plagiarism detection system implemented based on the source retrieval and text alignment algorithms which developed for the international competition on plagiarism detection organized by CLEF. The text alignment algorism in HawkEyes gained the first place in PAN@CLEF2012. In the demonstration, we will present our system implemented on PAN@CLEF2014 training data corpus.
文摘The problem of high similarity in homework has troubled teachers with time. Previous plagiarism detection systems are mainly realized by string matching which has a limitation, i.e., image homework cannot be detected. To this issue, we propose a new method of plagiarism detection in homework. First,we get fingerprint features of image homework by converting text homework into images. Then, we use image hashing algorithm and hamming distance to calculate the similarity of these features. Finally, we perform the empirical study on course of Computer Network Experiment, the test shows that our method not only reliably keeps the detection speedily, but also consistently ensures precision and false positive rate.
基金supported by the National Natural Science Foundation of China(No.62271456)the Open Projects Program of State Key Laboratory of Multimodal Artificial Intelligence Systems.
文摘Low-resource text plagiarism detection faces a significant challenge due to the limited availability of labeled data for training.This task requires the development of sophisticated algorithms capable of identifying similarities and differences in texts,particularly in the realm of semantic rewriting and translation-based plagiarism detection.In this paper,we present an enhanced attentive Siamese Long Short-Term Memory(LSTM)network designed for Tibetan-Chinese plagiarism detection.Our approach begins with the introduction of translation-based data augmentation,aimed at expanding the bilingual training dataset.Subsequently,we propose a pre-detection method leveraging abstract document vectors to enhance detection efficiency.Finally,we introduce an improved attentive Siamese LSTM network tailored for Tibetan-Chinese plagiarism detection.We conduct comprehensive experiments to showcase the effectiveness of our proposed plagiarism detection framework.
基金This work is supported by Technical Education Quality Program-TEQIP III.The project is implemented by NPIU,which is a unit of MHRD,Govt of India for implementation of world bank assisted projects in Technical Education.
文摘Purpose-Natural languages have a fundamental quality of suppleness that makes it possible to present a single idea in plenty of different ways.This feature is often exploited in the academic world,leading to the theft of work referred to as plagiarism.Many approaches have been put forward to detect such cases based on various text features and grammatical structures of languages.However,there is a huge scope of improvement for detecting intelligent plagiarism.Design/methodology/approach-To realize this,the paper introduces a hybrid model to detect intelligent plagiarism by breaking the entire process into three stages:(1)clustering,(2)vector formulation in each cluster based on semantic roles,normalization and similarity index calculation and(3)Summary generation using encoder-decoder.An effective weighing scheme has been introduced to select terms used to build vectors based on K-means,which is calculated on the synonym set for the said term.If the value calculated in the last stage lies above a predefined threshold,only then the next semantic argument is analyzed.When the similarity score for two documents is beyond the threshold,a short summary for plagiarized documents is created.Findings-Experimental results show that this method is able to detect connotation and concealment used in idea plagiarism besides detecting literal plagiarism.Originality/value-The proposed model can help academics stay updated by providing summaries of relevant articles.It would eliminate the practice of plagiarism infesting the academic community at an unprecedented pace.The model will also accelerate the process of reviewing academic documents,aiding in the speedy publishing of research articles.
基金the Social Science Fund of Heilongjiang Province(No.18TQB103)the National Natural Science Foundation of China(No.61806075+1 种基金No.61772177)the Natural Science Foundation of Heilongjiang Province(No.F2018029).
文摘Plagiarism detection system plays an essential role in education quality improvement by helping teachers to detect plagiarism.Using a number of measures customized to determine occurrences of plagiarism is the most common approach for plagiarism detection tool.It is simple and effective,while it lacks flexibility when applied in more complicated situations.This paper proposes the MLChecker,a smart plagiarism detection system,to provide more flexible detection tactics.An automatic plagiarism dataset construction method was exploited in MLChecker to dynamically update the plagiarism detection algorithms according to different detection tasks.And the full-process quality management functions were also provided by MLChecker.The result shows that the detection accuracy is raised by 56%.Compared with traditional plagiarism detection tools,MLChecker is with higher accuracy and efficiency.