Understanding the content of the source code and its regular expression is very difficult when they are written in an unfamiliar language.Pseudo-code explains and describes the content of the code without using syntax...Understanding the content of the source code and its regular expression is very difficult when they are written in an unfamiliar language.Pseudo-code explains and describes the content of the code without using syntax or programming language technologies.However,writing Pseudo-code to each code instruction is laborious.Recently,neural machine translation is used to generate textual descriptions for the source code.In this paper,a novel deep learning-based transformer(DLBT)model is proposed for automatic Pseudo-code generation from the source code.The proposed model uses deep learning which is based on Neural Machine Translation(NMT)to work as a language translator.The DLBT is based on the transformer which is an encoder-decoder structure.There are three major components:tokenizer and embeddings,transformer,and post-processing.Each code line is tokenized to dense vector.Then transformer captures the relatedness between the source code and the matching Pseudo-code without the need of Recurrent Neural Network(RNN).At the post-processing step,the generated Pseudo-code is optimized.The proposed model is assessed using a real Python dataset,which contains more than 18,800 lines of a source code written in Python.The experiments show promising performance results compared with other machine translation methods such as Recurrent Neural Network(RNN).The proposed DLBT records 47.32,68.49 accuracy and BLEU performance measures,respectively.展开更多
Code similarity analysis has become more popular due to its significant applicantions,including vulnerability detection,malware detection,and patch analysis.Since the source code of the software is difficult to obtain...Code similarity analysis has become more popular due to its significant applicantions,including vulnerability detection,malware detection,and patch analysis.Since the source code of the software is difficult to obtain under most circumstances,binary-level code similarity analysis(BCSA)has been paid much attention to.In recent years,many BCSA studies incorporating Al techniques focus on deriving semantic information from binary functions with code representations such as assembly code,intermediate representations,and control flow graphs to measure the similarity.However,due to the impacts of different compilers,architectures,and obfuscations,binaries compiled from the same source code may vary considerably,which becomes the major obstacle for these works to obtain robust features.In this paper,we propose a solution,named UPPC(Unleashing the Power of Pseudo-code),which leverages the pseudo-code of binary function as input,to address the binary code similarity analysis challenge,since pseudocode has higher abstraction and is platform-independent compared to binary instructions.UPPC selectively inlines the functions to capture the full function semantics across different compiler optimization levels and uses a deep pyramidal convolutional neural network to obtain the semantic embedding of the function.We evaluated UPPC on a data set containing vulnerabilities and a data set including different architectures(X86,ARM),different optimization options(O0-O3),different compilers(GCC,Clang),and four obfuscation strategies.The experimental results show that the accuracy of UPPC in function search is 33.2%higher than that of existing methods.展开更多
为了进一步提高传统码索引调制(Code Index Modulation, CIM)的误码率性能和降低伪随机(Pseudo Noise, PN)码索引资源的耗费量,提出非正交分组-码索引调制(Nonorthogonal Grouping-Code Index Modulation, NG-CIM)方案。NG-CIM在发送端...为了进一步提高传统码索引调制(Code Index Modulation, CIM)的误码率性能和降低伪随机(Pseudo Noise, PN)码索引资源的耗费量,提出非正交分组-码索引调制(Nonorthogonal Grouping-Code Index Modulation, NG-CIM)方案。NG-CIM在发送端将每一个传输时隙并行分为调制块和映射块,调制块通过比特分组后映射成为多组调制符号,映射块为各组调制符号的正交分量和同相分量索引相同的PN码进行直接序列扩频传输。仿真结果表明,在加性高斯白噪声信道中,随着频谱效率的提升,NG-CIM的误码率性能较CIM有着约2~4dB的改善,同时PN码资源的耗费量也大大减少。展开更多
文摘Understanding the content of the source code and its regular expression is very difficult when they are written in an unfamiliar language.Pseudo-code explains and describes the content of the code without using syntax or programming language technologies.However,writing Pseudo-code to each code instruction is laborious.Recently,neural machine translation is used to generate textual descriptions for the source code.In this paper,a novel deep learning-based transformer(DLBT)model is proposed for automatic Pseudo-code generation from the source code.The proposed model uses deep learning which is based on Neural Machine Translation(NMT)to work as a language translator.The DLBT is based on the transformer which is an encoder-decoder structure.There are three major components:tokenizer and embeddings,transformer,and post-processing.Each code line is tokenized to dense vector.Then transformer captures the relatedness between the source code and the matching Pseudo-code without the need of Recurrent Neural Network(RNN).At the post-processing step,the generated Pseudo-code is optimized.The proposed model is assessed using a real Python dataset,which contains more than 18,800 lines of a source code written in Python.The experiments show promising performance results compared with other machine translation methods such as Recurrent Neural Network(RNN).The proposed DLBT records 47.32,68.49 accuracy and BLEU performance measures,respectively.
文摘Code similarity analysis has become more popular due to its significant applicantions,including vulnerability detection,malware detection,and patch analysis.Since the source code of the software is difficult to obtain under most circumstances,binary-level code similarity analysis(BCSA)has been paid much attention to.In recent years,many BCSA studies incorporating Al techniques focus on deriving semantic information from binary functions with code representations such as assembly code,intermediate representations,and control flow graphs to measure the similarity.However,due to the impacts of different compilers,architectures,and obfuscations,binaries compiled from the same source code may vary considerably,which becomes the major obstacle for these works to obtain robust features.In this paper,we propose a solution,named UPPC(Unleashing the Power of Pseudo-code),which leverages the pseudo-code of binary function as input,to address the binary code similarity analysis challenge,since pseudocode has higher abstraction and is platform-independent compared to binary instructions.UPPC selectively inlines the functions to capture the full function semantics across different compiler optimization levels and uses a deep pyramidal convolutional neural network to obtain the semantic embedding of the function.We evaluated UPPC on a data set containing vulnerabilities and a data set including different architectures(X86,ARM),different optimization options(O0-O3),different compilers(GCC,Clang),and four obfuscation strategies.The experimental results show that the accuracy of UPPC in function search is 33.2%higher than that of existing methods.
文摘为了进一步提高传统码索引调制(Code Index Modulation, CIM)的误码率性能和降低伪随机(Pseudo Noise, PN)码索引资源的耗费量,提出非正交分组-码索引调制(Nonorthogonal Grouping-Code Index Modulation, NG-CIM)方案。NG-CIM在发送端将每一个传输时隙并行分为调制块和映射块,调制块通过比特分组后映射成为多组调制符号,映射块为各组调制符号的正交分量和同相分量索引相同的PN码进行直接序列扩频传输。仿真结果表明,在加性高斯白噪声信道中,随着频谱效率的提升,NG-CIM的误码率性能较CIM有着约2~4dB的改善,同时PN码资源的耗费量也大大减少。