期刊文献+

基于API潜在语义的勒索软件早期检测方法

Ransomware Early Detection Method Based on API Latent Semantics
下载PDF
导出
摘要 加密型勒索软件通过加密用户文件来勒索赎金.现有的基于第一条加密应用编程接口(Application Programming Interface,API)的早期检测方法无法在勒索软件执行加密行为前将其检出.由于不同家族的勒索软件开始执行其加密行为的时刻各不相同,现有的基于固定时间阈值的早期检测方法仅能将少量勒索软件在其执行加密行为前准确检出.为进一步提升勒索软件检测的及时性,本文在分析多款勒索软件运行初期调用动态链接库(Dynamic Link Library,DLL)和API行为的基础上,提出了一个表征软件从开始运行到首次调用加密相关DLL之间的时间段的概念——运行初始阶段(Initial Phase of Operation,IPO),并提出了一个以软件在IPO内产生的API序列为检测对象的勒索软件早期检测方法,即基于API潜在语义的勒索软件早期检测方法(Ransomware Early Detection Method based on API Latent Semantics,REDMALS).REDMALS采集IPO内的API序列后,采用TF-IDF(Term Frequency-Inverse Document Frequency)算法以及潜在语义分析(Latent Semantic Analysis,LSA)算法对采集的API序列生成特征向量及提取潜在的语义结构,再运用机器学习算法构建检测模型用于勒索软件检测.实验结果显示运用随机森林算法的REDMALS在构建的变种测试集和未知测试集上可分别获得97.7%、96.0%的准确率,且两个测试集中83%和76%的勒索软件样本可在其执行加密行为前被检出. Cryptographic ransomware extorts a ransom by encrypting user files.Existing early detection methods based on the first encryption-related application programming interface(API)cannot detect ransomware before it executes encryption behavior.Because the point at which different ransomware families begin executing their encryption behavior varies,existing early detection methods based on fixed time thresholds can only accurately detect a small fraction of ransom⁃ware before it executes encryption behavior.To further improve the timeliness of ransomware detection,this article propos⁃es a concept that characterizes the time period from the start of software operation to the first call of encryption-related dy⁃namic-link libraries(DLLs),namely the initial phase of operation(IPO).Based on the analysis of DLL and API call behavior in the early operational phase of several ransomwares,this article presents a method based on the API sequences generated by the software within the IPO as the detection object,namely the ransomware early detection method based on API latent seman⁃tics(REDMALS).REDMALS captures the API sequences within the IPO,uses the term frequency-inverse document frequen⁃cy algorithm and the latent semantic analysis algorithm to generate feature vectors on the captured API sequences and to ex⁃tract potential semantic structures,respectively,and then uses a machine learning algorithm to construct a detection model for ransomware detection.The experimental results show that REDMALS using the random forest algorithm achieves 97.7%and 96.0%accuracy on the constructed variant test set and unknown test set,respectively,and 83%and 76%of the ransom⁃ware samples in both test sets,respectively,can be detected before they perform any encryption behavior.
作者 罗斌 郭春 申国伟 崔允贺 陈意 平源 LUO Bin;GUO Chun;SHEN Guo-wei;CUI Yun-he;CHEN Yi;PING Yuan(State Key Laboratory of Public Big Data,College of Computer Science and Technology,Guizhou University,Guiyang,Guizhou 550025,China;School of Information Engineering,Xuchang University,Xuchang,Henan 461000,China)
出处 《电子学报》 EI CAS CSCD 北大核心 2024年第4期1288-1295,共8页 Acta Electronica Sinica
基金 国家自然科学基金(No.62162009) 贵州省科技支撑计划(No[.2022]071) 贵州省高等学校大数据与网络安全创新团队(No[.2023]052) 河南省科技攻关计划项目(No.222102210048)。
关键词 勒索软件 早期检测 API TF-IDF 潜在语义分析 随机森林 ransomware early detection API TF-IDF latent semantic analysis random forest
  • 相关文献

参考文献2

二级参考文献8

共引文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部