The existing software bug localization models treat the source file as natural language, which leads to the loss of syntactical and structure information of the source file. A bug localization model based on syntactic...The existing software bug localization models treat the source file as natural language, which leads to the loss of syntactical and structure information of the source file. A bug localization model based on syntactical and semantic information of source code is proposed. Firstly, abstract syntax tree(AST) is divided based on node category to obtain statement sequence. The statement tree is encoded into vectors to capture lexical and syntactical knowledge at the statement level.Secondly, the source code is transformed into vector representation by the sequence naturalness of the statement. Therefore,the problem of gradient vanishing and explosion caused by a large AST size is obviated when using AST to the represent source code. Finally, the correlation between bug reports and source files are comprehensively analyzed from three aspects of syntax, semantics and text to locate the buggy code. Experiments show that compared with other standard models, the proposed model improves the performance of bug localization, and it has good advantages in mean reciprocal rank(MRR), mean average precision(MAP) and Top N Rank.展开更多
Complex conditional statement is one of the bad code smells, which affects the quality of the code and design of software. In the proposed approach, two commonly-used design patterns for handling complex conditional s...Complex conditional statement is one of the bad code smells, which affects the quality of the code and design of software. In the proposed approach, two commonly-used design patterns for handling complex conditional statements are selected, and they are the factory method pattern and the strategy pattem. Two pattern-directed refactoring approaches based on the two design patterns are proposed. Each approach contains a refactoring opportunities identification algorithm and an automated refactoring algorithm. After parsing the abstract syntax tree generated from source code, the refactoring opportunities are identified effectively and automatically. Then, for candidate code, refactoring algorithms are executed automatically, which are used to simplify or remove complex conditional statements. By empirical analysis and quality assessment, the code after refactoring has better maintainability and extensibility, and the proposed approach for automated pattern-directed refactoring succeeds to reduce code size and complexity of classes.展开更多
In order to deal with the complex association relationships between classes in an object-oriented software system,a novel approach for identifying refactoring opportunities is proposed.The approach can be used to dete...In order to deal with the complex association relationships between classes in an object-oriented software system,a novel approach for identifying refactoring opportunities is proposed.The approach can be used to detect complex and duplicated many-to-many association relationships in source code,and to provide guidance for further refactoring.In the approach,source code is first transformed to an abstract syntax tree from which all data members of each class are extracted,then each class is characterized in connection with a set of association classes saving its data members.Next,classes in common associations are obtained by comparing different association classes sets in integrated analysis.Finally,on condition of pre-defined thresholds,all class sets in candidate for refactoring and their common association classes are saved and exported.This approach is tested on 4 projects.The results show that the precision is over 96%when the threshold is 3,and 100%when the threshold is 4.Meanwhile,this approach has good execution efficiency as the execution time taken for a project with more than 500 classes is less than 4 s,which also indicates that it can be applied to projects of different scales to identify their refactoring opportunities effectively.展开更多
In the recent era of software development,reusing software is one of the major activities that is widely used to save time.To reuse software,the copy and paste method is used and this whole process is known as code cl...In the recent era of software development,reusing software is one of the major activities that is widely used to save time.To reuse software,the copy and paste method is used and this whole process is known as code cloning.This activity leads to problems like difficulty in debugging,increase in time to debug and manage software code.In the literature,various algorithms have been developed to find out the clones but it takes too much time as well as more space to figure out the clones.Unfortunately,most of them are not scalable.This problem has been targeted upon in this paper.In the proposed framework,authors have proposed a new method of identifying clones that takes lesser time to find out clones as compared with many popular code clone detection algorithms.The proposed framework has also addressed one of the key issues in code clone detection i.e.,detection of near-miss(Type-3)and semantic clones(Type-4)with significant accuracy of 95.52%and 92.80%respectively.The present study is divided into two phases,the first method converts any code into an intermediate representation form i.e.,Hashinspired abstract syntax trees.In the second phase,these abstract syntax trees are passed to a novel approach“Similarity-based self-adjusting hash inspired abstract syntax tree”algorithm that helps in knowing the similarity level of codes.The proposed method has shown a lot of improvement over the existing code clones identification methods.展开更多
Evaluation is an essential part of the teaching process,especially in the programming course.Both students and teachers can benefit significantly from automatic program evaluation.It shortens the time required for ass...Evaluation is an essential part of the teaching process,especially in the programming course.Both students and teachers can benefit significantly from automatic program evaluation.It shortens the time required for assessment so that students can get immediate feedback.At the same time,it can also significantly reduce the workload of teachers.Currently,the automated program assessment system mainly uses a combination of static and dynamic analysis methods.The system is faced with two crucial problems of the unfinished code evaluation and the template code construction.This paper proposes a method of combining deep learning with static analysis.The syntax tree repair is used to solve the problem that the code with compiling errors cannot generate the correct syntax tree.Moreover,the target code is converted to a subset of solution space through the syntax tree standardization,which reduces the number of template code needed.Based on deep learning,the embedded token vector keeps the code’s context all the time,which ensures that the lexical-semantic remains unchanged as much as possible after the syntax tree changes.Finally,the standardized tree is represented as a vector by the recursive neural network.Cosine similarity between target and template code vectors is used as an evaluation score.The experiment shows that the similarity scores obtained by this method are consistent with the expert scores.This method can provide support for future research,such as difficult feedback and has great significance.展开更多
In recent years,Power Shell has increasingly been reported as appearing in a variety of cyber attacks.However,because the PowerShell language is dynamic by design and can construct script fragments at different levels...In recent years,Power Shell has increasingly been reported as appearing in a variety of cyber attacks.However,because the PowerShell language is dynamic by design and can construct script fragments at different levels,state-of-the-art static analysis based Power Shell attack detection approaches are inherently vulnerable to obfuscations.In this paper,we design the first generic,effective,and lightweight deobfuscation approach for PowerShell scripts.To precisely identify the obfuscated script fragments,we define obfuscation based on the differences in the impacts on the abstract syntax trees of PowerShell scripts and propose a novel emulation-based recovery technology.Furthermore,we design the first semantic-aware PowerShell attack detection system that leverages the classic objective-oriented association mining algorithm and newly identifies 31 semantic signatures.The experimental results on 2342 benign samples and 4141 malicious samples show that our deobfuscation method takes less than 0.5 s on average and increases the similarity between the obfuscated and original scripts from 0.5%to 93.2%.By deploying our deobfuscation method,the attack detection rates for Windows Defender and VirusTotal increase substantially from 0.33%and 2.65%to 78.9%and 94.0%,respectively.Moreover,our detection system outperforms both existing tools with a 96.7%true positive rate and a 0%false positive rate on average.展开更多
Complex System Modeling,Simulation and Optimization Language(CoSMSOL)is problem-oriented and designed to run on multi-core computers.This paper provides the system environment of CoSMSOL and proposes the modeling meth...Complex System Modeling,Simulation and Optimization Language(CoSMSOL)is problem-oriented and designed to run on multi-core computers.This paper provides the system environment of CoSMSOL and proposes the modeling methods of complex system,language text specification,function library,algorithm library,parallel simulation algorithms and intelligent optimization algorithms which support continuous system,discrete system and agent systems.Also,we developed a simulation language compiler of CoSMSOL,which is employed in two case studies generating a multi-entity war gaming system and an aerodynamic spacecraft model.The two cases illustrate main functions and implementation processes of CoSMSOL.The results validate that CoSMSOL is useful to model agent-based system and aerospace system.展开更多
基金supported by the National Key R&D Program of China (2018YFB1702700)。
文摘The existing software bug localization models treat the source file as natural language, which leads to the loss of syntactical and structure information of the source file. A bug localization model based on syntactical and semantic information of source code is proposed. Firstly, abstract syntax tree(AST) is divided based on node category to obtain statement sequence. The statement tree is encoded into vectors to capture lexical and syntactical knowledge at the statement level.Secondly, the source code is transformed into vector representation by the sequence naturalness of the statement. Therefore,the problem of gradient vanishing and explosion caused by a large AST size is obviated when using AST to the represent source code. Finally, the correlation between bug reports and source files are comprehensively analyzed from three aspects of syntax, semantics and text to locate the buggy code. Experiments show that compared with other standard models, the proposed model improves the performance of bug localization, and it has good advantages in mean reciprocal rank(MRR), mean average precision(MAP) and Top N Rank.
文摘Complex conditional statement is one of the bad code smells, which affects the quality of the code and design of software. In the proposed approach, two commonly-used design patterns for handling complex conditional statements are selected, and they are the factory method pattern and the strategy pattem. Two pattern-directed refactoring approaches based on the two design patterns are proposed. Each approach contains a refactoring opportunities identification algorithm and an automated refactoring algorithm. After parsing the abstract syntax tree generated from source code, the refactoring opportunities are identified effectively and automatically. Then, for candidate code, refactoring algorithms are executed automatically, which are used to simplify or remove complex conditional statements. By empirical analysis and quality assessment, the code after refactoring has better maintainability and extensibility, and the proposed approach for automated pattern-directed refactoring succeeds to reduce code size and complexity of classes.
文摘In order to deal with the complex association relationships between classes in an object-oriented software system,a novel approach for identifying refactoring opportunities is proposed.The approach can be used to detect complex and duplicated many-to-many association relationships in source code,and to provide guidance for further refactoring.In the approach,source code is first transformed to an abstract syntax tree from which all data members of each class are extracted,then each class is characterized in connection with a set of association classes saving its data members.Next,classes in common associations are obtained by comparing different association classes sets in integrated analysis.Finally,on condition of pre-defined thresholds,all class sets in candidate for refactoring and their common association classes are saved and exported.This approach is tested on 4 projects.The results show that the precision is over 96%when the threshold is 3,and 100%when the threshold is 4.Meanwhile,this approach has good execution efficiency as the execution time taken for a project with more than 500 classes is less than 4 s,which also indicates that it can be applied to projects of different scales to identify their refactoring opportunities effectively.
文摘In the recent era of software development,reusing software is one of the major activities that is widely used to save time.To reuse software,the copy and paste method is used and this whole process is known as code cloning.This activity leads to problems like difficulty in debugging,increase in time to debug and manage software code.In the literature,various algorithms have been developed to find out the clones but it takes too much time as well as more space to figure out the clones.Unfortunately,most of them are not scalable.This problem has been targeted upon in this paper.In the proposed framework,authors have proposed a new method of identifying clones that takes lesser time to find out clones as compared with many popular code clone detection algorithms.The proposed framework has also addressed one of the key issues in code clone detection i.e.,detection of near-miss(Type-3)and semantic clones(Type-4)with significant accuracy of 95.52%and 92.80%respectively.The present study is divided into two phases,the first method converts any code into an intermediate representation form i.e.,Hashinspired abstract syntax trees.In the second phase,these abstract syntax trees are passed to a novel approach“Similarity-based self-adjusting hash inspired abstract syntax tree”algorithm that helps in knowing the similarity level of codes.The proposed method has shown a lot of improvement over the existing code clones identification methods.
基金supported by the 2018-2020 Higher Education Talent Training Quality and Teaching Reform Project of Sichuan Province(Grant No.JG2018-46)the Science and Technology Planning Program of Sichuan University and Luzhou(Grant No.2017CDLZG30)the Postdoctoral Science fund of Sichuan University(Grant No.2019SCU12058).
文摘Evaluation is an essential part of the teaching process,especially in the programming course.Both students and teachers can benefit significantly from automatic program evaluation.It shortens the time required for assessment so that students can get immediate feedback.At the same time,it can also significantly reduce the workload of teachers.Currently,the automated program assessment system mainly uses a combination of static and dynamic analysis methods.The system is faced with two crucial problems of the unfinished code evaluation and the template code construction.This paper proposes a method of combining deep learning with static analysis.The syntax tree repair is used to solve the problem that the code with compiling errors cannot generate the correct syntax tree.Moreover,the target code is converted to a subset of solution space through the syntax tree standardization,which reduces the number of template code needed.Based on deep learning,the embedded token vector keeps the code’s context all the time,which ensures that the lexical-semantic remains unchanged as much as possible after the syntax tree changes.Finally,the standardized tree is represented as a vector by the recursive neural network.Cosine similarity between target and template code vectors is used as an evaluation score.The experiment shows that the similarity scores obtained by this method are consistent with the expert scores.This method can provide support for future research,such as difficult feedback and has great significance.
基金supported by the National Natural Science Foundation of China(No.U1936215)。
文摘In recent years,Power Shell has increasingly been reported as appearing in a variety of cyber attacks.However,because the PowerShell language is dynamic by design and can construct script fragments at different levels,state-of-the-art static analysis based Power Shell attack detection approaches are inherently vulnerable to obfuscations.In this paper,we design the first generic,effective,and lightweight deobfuscation approach for PowerShell scripts.To precisely identify the obfuscated script fragments,we define obfuscation based on the differences in the impacts on the abstract syntax trees of PowerShell scripts and propose a novel emulation-based recovery technology.Furthermore,we design the first semantic-aware PowerShell attack detection system that leverages the classic objective-oriented association mining algorithm and newly identifies 31 semantic signatures.The experimental results on 2342 benign samples and 4141 malicious samples show that our deobfuscation method takes less than 0.5 s on average and increases the similarity between the obfuscated and original scripts from 0.5%to 93.2%.By deploying our deobfuscation method,the attack detection rates for Windows Defender and VirusTotal increase substantially from 0.33%and 2.65%to 78.9%and 94.0%,respectively.Moreover,our detection system outperforms both existing tools with a 96.7%true positive rate and a 0%false positive rate on average.
文摘Complex System Modeling,Simulation and Optimization Language(CoSMSOL)is problem-oriented and designed to run on multi-core computers.This paper provides the system environment of CoSMSOL and proposes the modeling methods of complex system,language text specification,function library,algorithm library,parallel simulation algorithms and intelligent optimization algorithms which support continuous system,discrete system and agent systems.Also,we developed a simulation language compiler of CoSMSOL,which is employed in two case studies generating a multi-entity war gaming system and an aerodynamic spacecraft model.The two cases illustrate main functions and implementation processes of CoSMSOL.The results validate that CoSMSOL is useful to model agent-based system and aerospace system.