In the recent era of software development,reusing software is one of the major activities that is widely used to save time.To reuse software,the copy and paste method is used and this whole process is known as code cl...In the recent era of software development,reusing software is one of the major activities that is widely used to save time.To reuse software,the copy and paste method is used and this whole process is known as code cloning.This activity leads to problems like difficulty in debugging,increase in time to debug and manage software code.In the literature,various algorithms have been developed to find out the clones but it takes too much time as well as more space to figure out the clones.Unfortunately,most of them are not scalable.This problem has been targeted upon in this paper.In the proposed framework,authors have proposed a new method of identifying clones that takes lesser time to find out clones as compared with many popular code clone detection algorithms.The proposed framework has also addressed one of the key issues in code clone detection i.e.,detection of near-miss(Type-3)and semantic clones(Type-4)with significant accuracy of 95.52%and 92.80%respectively.The present study is divided into two phases,the first method converts any code into an intermediate representation form i.e.,Hashinspired abstract syntax trees.In the second phase,these abstract syntax trees are passed to a novel approach“Similarity-based self-adjusting hash inspired abstract syntax tree”algorithm that helps in knowing the similarity level of codes.The proposed method has shown a lot of improvement over the existing code clones identification methods.展开更多
Android is currently one of the most popular smartphone operating systems. However, Android has the largest share of global mobile malware and significant public attention has been brought to the security issues of An...Android is currently one of the most popular smartphone operating systems. However, Android has the largest share of global mobile malware and significant public attention has been brought to the security issues of Android. In this paper, we investigate the use of a clone detector to identify known Android malware. We collect a set of Android applications known to contain malware and a set of benign applications. We extract the Java source code from the binary code of the applications and use NiCad, a near-miss clone detector, to find the classes of clones in a small subset of the malicious applications. We then use these clone classes as a signature to find similar source files in the rest of the malicious applications. The benign collection is used as a control group. In our evaluation, we successfully decompile more than 1 000 malicious apps in 19 malware families. Our results show that using a small portion of malicious applications as a training set can detect 95% of previously known malware with very low false positives and high accuracy at 96.88%. Our method can effectively and reliably pint)oint malicious applications that belon~ to certain ma|w^ro fA.mili~展开更多
Clone detection has received much attention in many fields such as malicious code detection,vulnerability hunting,and code copyright infringement detection.However,cyber criminals may obfuscate code to impede violatio...Clone detection has received much attention in many fields such as malicious code detection,vulnerability hunting,and code copyright infringement detection.However,cyber criminals may obfuscate code to impede violation detection.To date,few studies have investigated the robustness of clone detectors,especially in-fashion deep learning-based ones,against obfuscation.Meanwhile,most of these studies only measure the difference between one code snippet and its obfuscation version.However,in reality,the attackers may modify the original code before obfuscating it.Then what we should evaluate is the detection of obfuscated code from cloned code,not the original code.For this,we conduct a comprehensive study evaluating 3 popular deep-learning based clone detectors and 6 commonly used traditional ones.Regarding the data,we collect 6512 clone pairs of five types from the dataset BigCloneBench and obfuscate one program of each pair via 64 strategies of 6 state-of-art commercial obfuscators.We also collect 1424 non-clone pairs to evaluate the false positives.In sum,a benchmark of 524,148 code pairs(either clone or not)are generated,which are passed to clone detectors for evaluation.To automate the evaluation,we develop one uniform evaluation framework,integrating the clone detectors and obfuscators.The results bring us interesting findings on how obfuscation affects the performance of clone detection and what is the difference between traditional and deep learning-based clone detectors.In addition,we conduct manual code reviews to uncover the root cause of the phenomenon and give suggestions to users from different perspectives.展开更多
Rampant cloned vehicle offenses have caused great damage to transportation management as well as public safety and even the world economy.It necessitates an efficient detection mechanism to identify the vehicles with ...Rampant cloned vehicle offenses have caused great damage to transportation management as well as public safety and even the world economy.It necessitates an efficient detection mechanism to identify the vehicles with fake license plates accurately,and further explore the motives through discerning the behaviors of cloned vehicles.The ubiquitous inspection spots that deployed in the city have been collecting moving information of passing vehicles,which opens up a new opportunity for cloned vehicle detection.Existing detection methods cannot detect the cloned vehicle effectively due to that they use the fixed speed threshold.In this paper,we propose a two-phase framework,called CVDF,to detect cloned vehicles and discriminate behavior patterns of vehicles that use the same plate number.In the detection phase,cloned vehicles are identified based on speed thresholds extracted from historical trajectory and behavior abnormality analysis within the local neighborhood.In the behavior analysis phase,consider the traces of vehicles that uses the same license plate will be mixed together,we aim to differentiate the trajectories through matching degree-based clustering and then extract frequent temporal behavior patterns.The experimental results on the real-world data show that CVDF framework has high detection precision and could reveal cloned vehicles’behavior effectively.Our proposal provides a scientific basis for traffic management authority to solve the crime of cloned vehicle.展开更多
Information technology facilitates people’s lives greatly,while it also brings many security issues, such as code plagiarism, softwarein-fringement, and malicious code. In order to solve the problems,reverse engineer...Information technology facilitates people’s lives greatly,while it also brings many security issues, such as code plagiarism, softwarein-fringement, and malicious code. In order to solve the problems,reverse engineering is applied to analyze abundant binary code manually,which costs a lot of time. However, due to the maturity of differentobfuscation techniques, the disassembly code generated from the samefunction differs greatly in the opcode and control flow graph throughdifferent obfuscation options. This paper propose a method inspired bynatural language processing, to realize the semantic similarity matchingof binary code in basic block granularity and function granularity. In thesimilarity matching task of binary code obtained by different obfuscationoptions of LLVM, the indicator reaches 99%, which is better than theexisting technologies.展开更多
For this special section on software systems, six research leaders in software systems, as guest editors tor this special section, discuss important issues that will shape this field's future research directions. The...For this special section on software systems, six research leaders in software systems, as guest editors tor this special section, discuss important issues that will shape this field's future research directions. The essays included in this roundtable article cover research opportunities and challenges for large-scale software systems such as querying organization- wide software behaviors (Xusheng Xiao), logging and log analysis (Jian-Ouang Lou), engineering reliable cloud distributed systems (Shan Lu), usage data (David C. Shepherd), clone detection and management (Xin Peng), and code search and beyond (Qian-Xiang Wang). - Tao Xie, Leading Editor of Software Systems.展开更多
文摘In the recent era of software development,reusing software is one of the major activities that is widely used to save time.To reuse software,the copy and paste method is used and this whole process is known as code cloning.This activity leads to problems like difficulty in debugging,increase in time to debug and manage software code.In the literature,various algorithms have been developed to find out the clones but it takes too much time as well as more space to figure out the clones.Unfortunately,most of them are not scalable.This problem has been targeted upon in this paper.In the proposed framework,authors have proposed a new method of identifying clones that takes lesser time to find out clones as compared with many popular code clone detection algorithms.The proposed framework has also addressed one of the key issues in code clone detection i.e.,detection of near-miss(Type-3)and semantic clones(Type-4)with significant accuracy of 95.52%and 92.80%respectively.The present study is divided into two phases,the first method converts any code into an intermediate representation form i.e.,Hashinspired abstract syntax trees.In the second phase,these abstract syntax trees are passed to a novel approach“Similarity-based self-adjusting hash inspired abstract syntax tree”algorithm that helps in knowing the similarity level of codes.The proposed method has shown a lot of improvement over the existing code clones identification methods.
文摘Android is currently one of the most popular smartphone operating systems. However, Android has the largest share of global mobile malware and significant public attention has been brought to the security issues of Android. In this paper, we investigate the use of a clone detector to identify known Android malware. We collect a set of Android applications known to contain malware and a set of benign applications. We extract the Java source code from the binary code of the applications and use NiCad, a near-miss clone detector, to find the classes of clones in a small subset of the malicious applications. We then use these clone classes as a signature to find similar source files in the rest of the malicious applications. The benign collection is used as a control group. In our evaluation, we successfully decompile more than 1 000 malicious apps in 19 malware families. Our results show that using a small portion of malicious applications as a training set can detect 95% of previously known malware with very low false positives and high accuracy at 96.88%. Our method can effectively and reliably pint)oint malicious applications that belon~ to certain ma|w^ro fA.mili~
基金IIE authors are supported in part by the National Key R&D Program of China(2020AAA0140001)NSFC U1836211,Beijing Natural Science Foundation(No.M22004),the Anhui Department of Science and Technology under Grant 202103a05020009Youth Innovation Promotion Association CAS,Beijing Academy of Artificial Intelligence(BAAI)and a research grant from Huawei.
文摘Clone detection has received much attention in many fields such as malicious code detection,vulnerability hunting,and code copyright infringement detection.However,cyber criminals may obfuscate code to impede violation detection.To date,few studies have investigated the robustness of clone detectors,especially in-fashion deep learning-based ones,against obfuscation.Meanwhile,most of these studies only measure the difference between one code snippet and its obfuscation version.However,in reality,the attackers may modify the original code before obfuscating it.Then what we should evaluate is the detection of obfuscated code from cloned code,not the original code.For this,we conduct a comprehensive study evaluating 3 popular deep-learning based clone detectors and 6 commonly used traditional ones.Regarding the data,we collect 6512 clone pairs of five types from the dataset BigCloneBench and obfuscate one program of each pair via 64 strategies of 6 state-of-art commercial obfuscators.We also collect 1424 non-clone pairs to evaluate the false positives.In sum,a benchmark of 524,148 code pairs(either clone or not)are generated,which are passed to clone detectors for evaluation.To automate the evaluation,we develop one uniform evaluation framework,integrating the clone detectors and obfuscators.The results bring us interesting findings on how obfuscation affects the performance of clone detection and what is the difference between traditional and deep learning-based clone detectors.In addition,we conduct manual code reviews to uncover the root cause of the phenomenon and give suggestions to users from different perspectives.
基金Our research was supported by NSFC(Grant Nos.U1501252,U1711262,61702423 and U1811264).
文摘Rampant cloned vehicle offenses have caused great damage to transportation management as well as public safety and even the world economy.It necessitates an efficient detection mechanism to identify the vehicles with fake license plates accurately,and further explore the motives through discerning the behaviors of cloned vehicles.The ubiquitous inspection spots that deployed in the city have been collecting moving information of passing vehicles,which opens up a new opportunity for cloned vehicle detection.Existing detection methods cannot detect the cloned vehicle effectively due to that they use the fixed speed threshold.In this paper,we propose a two-phase framework,called CVDF,to detect cloned vehicles and discriminate behavior patterns of vehicles that use the same plate number.In the detection phase,cloned vehicles are identified based on speed thresholds extracted from historical trajectory and behavior abnormality analysis within the local neighborhood.In the behavior analysis phase,consider the traces of vehicles that uses the same license plate will be mixed together,we aim to differentiate the trajectories through matching degree-based clustering and then extract frequent temporal behavior patterns.The experimental results on the real-world data show that CVDF framework has high detection precision and could reveal cloned vehicles’behavior effectively.Our proposal provides a scientific basis for traffic management authority to solve the crime of cloned vehicle.
基金Supported by the Foundation of National Natural Science Foundation of China(No.61802435).
文摘Information technology facilitates people’s lives greatly,while it also brings many security issues, such as code plagiarism, softwarein-fringement, and malicious code. In order to solve the problems,reverse engineering is applied to analyze abundant binary code manually,which costs a lot of time. However, due to the maturity of differentobfuscation techniques, the disassembly code generated from the samefunction differs greatly in the opcode and control flow graph throughdifferent obfuscation options. This paper propose a method inspired bynatural language processing, to realize the semantic similarity matchingof binary code in basic block granularity and function granularity. In thesimilarity matching task of binary code obtained by different obfuscationoptions of LLVM, the indicator reaches 99%, which is better than theexisting technologies.
文摘For this special section on software systems, six research leaders in software systems, as guest editors tor this special section, discuss important issues that will shape this field's future research directions. The essays included in this roundtable article cover research opportunities and challenges for large-scale software systems such as querying organization- wide software behaviors (Xusheng Xiao), logging and log analysis (Jian-Ouang Lou), engineering reliable cloud distributed systems (Shan Lu), usage data (David C. Shepherd), clone detection and management (Xin Peng), and code search and beyond (Qian-Xiang Wang). - Tao Xie, Leading Editor of Software Systems.