期刊文献+

基于simhash与倒排索引的复用代码快速溯源方法 被引量:9

Fast reused code tracing method based on simhash and inverted index
下载PDF
导出
摘要 提出了一种新颖的复用代码精确快速溯源方法。该方法以函数为单位,基于simhash与倒排索引技术,能在海量代码中快速溯源相似函数。首先基于simhash利用海量样本构建具有三级倒排索引结构的代码库。对于待溯源函数,依据函数中代码块的simhash值快速发现相似代码块,继而倒排索引潜在相似函数,依据代码块跳转关系精确判定是否相似,并溯源至所在样本。实验结果表明,该方法在保证高准确率与召回率的前提下,基于代码库能快速识别样本中的编译器插入函数与复用函数。 A novel method for fast and accurately tracing reused code was proposed. Based on simhash and inverted index, the method can fast trace similar functions in massive code. First of all, a code database with three-level inverted index structures was constructed. For the function to be traced, similar code blocks could be found quickly according to simhash value of the code block in the function code. Then the potential similar functions could be fast traced using inverted index. Finally, really similar functions could be identified by comparing jump relationships of similar code blocks. Further, malware samples containing similar functions could be traced. The experimental results show that the method can quickly identify the functions inserted by compilers and the reused functions based on the code database under the premise of high accuracy and recall rate.
作者 乔延臣 云晓春 庹宇鹏 张永铮 QIAO Yan-chen YUN Xiao-chunl, TUO Yu-peng ZHANG Yong-zheng(Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China Graduate School, Chinese Academy of Sciences, Beijing 100039, China Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China)
出处 《通信学报》 EI CSCD 北大核心 2016年第11期104-113,共10页 Journal on Communications
基金 国家自然科学基金资助项目(No.61303261) 国家高技术研究发展计划("863"计划)基金资助项目(No.2013AA014703 No.2012AA012803) 国家242信息安全计划基金资助项目(No.2014A094) 中国科学院战略性科技先导专项基金资助项目(No.XDA06030200)~~
关键词 网络安全 复用代码 快速溯源 同源判定 恶意代码 network security, reused code, retrieval method, homology identification, malware
  • 相关文献

参考文献2

二级参考文献16

  • 1.震荡波等蠕虫应急处理报告[EB/OL].http://www.antiy.com/resource/cert/alarm/a040501.htm,.
  • 2Raghavan K. Automated duplicated-code detection and pro- cedure extraction[D]. Wisconsin: University of Wisconsin- Madison, 2003.
  • 3Google code search[EB/OL]. [2013-08-10]. http://en.wiki- pedia.org/wiki/Google_Code_Search.
  • 4Roy C K, Cordy J R. A survey on software clone detection research, Queen's Technical Report 541 [R]. 2007:115.
  • 5Baker I~ S. On finding duplication and near-duplicate in large software systems[C]//Proceedings of the 2rid Working Conference on Reverse Engineering (WCRE '95). Washington, DC, USA: IEEE Computer Society, 1995: 86-95.
  • 6Mockus A. Large-scale code reuse in open source sottware[C]// Proceedings of the 1st International Workshop on Emerging Trends in FLOSS Research and Development (FLOSS '07),Minneapolis, USA, 2007. Washington, DC, USA: IEEE Com- puter Society, 2007: 1-7.
  • 7Liu Peng. Cloud computing: programmers return to the age of personal hero[J/OL]. Programmers, 2010(7). http://www. programmer, com. cn/365.
  • 8Bellon S, Koschke R, Antoniol G, et al. Comparison and evaluation of clone detection tools[J]. IEEE Transactions on Software Engineering, 2007, 33(9): 577-591.
  • 9Kamiya T, Kusumoto S, Inoue K. CCFinder: a multi-linguistic token-based code clone detection system for large scale source code[J]. IEEE Transactions on Software Engineering, 2002, 28(7): 654-670.
  • 10Baxter I D, Yahin A, Moura L, et al. Clone detection using abstract syntax trees[C]//Proceedings of the 14th Interna- tional Conference on Software Maintenance (ICSM '98), Bethesda, USA, 1998. Washington, DC, USA: IEEE Com- puter Society, 1998: 368-377.

共引文献16

同被引文献70

引证文献9

二级引证文献61

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部