摘要
高级持续性威胁(Advanced Persistent Threat,APT)是当前最为严重的网络安全威胁之一.DNS隐蔽信道(DNS Covert Channel,DCC)由于其泛在性、隐蔽性成为攻击者手中理想的秘密信息传输通道,受到诸多APT组织的青睐.人工智能赋能的DCC检测方法逐步流行,但APT攻击相关恶意样本获取难、活性低等原因造成训练数据不平衡问题明显,严重影响了模型的检测性能.同时,已有检测工作使用DCC工具流量及少数恶意样本来评估系统,测试集覆盖范围有限,无法对系统进行全面、有效的评估.针对上述问题,本文基于攻击战术、技术和程序(Tactics,Techniques,and Procedures,TTPs)设计DCC流量生成框架并生成完备度可控的、覆盖大样本空间的、大数据量的恶意流量数据集.基于本研究生成的数据集,训练可解释性较强的机器学习模型,提出基于攻击流量自生成的DCC检测系统——DCCHunter.本研究收集了8个DCC恶意软件流量样本,复现了已被APT组织恶意运用的3个DCC工具产生流量,基于上述真实恶意样本评估系统对其未知的、真实的DCC攻击的检测能力.结果发现,系统对DCC的召回率可达99.80%,对数以亿计流量的误报率为0.29%.
The Advanced Persistent Threat(APT)is currently one of the most serious network security threats.Due to its ubiquity and concealment,DNS Covert Channel(DCC)has become an ideal secret channel in the hands of attackers,which remains active nowadays.With the development of DCC,from the exploration of attacks to organized attacks,the trend of organized attacks is becoming obvious.In response to this trend,researchers have also conducted in-depth studies on DCC detection.With the development of Artificial Intelligence(AI),AI-powered DCC detection has made some progress,but it also suffers many problems,such as lacking of real malicious sample sets for training.Datasets of DNS-based APT suffer some problems,including the difficulty of obtaining,low activity,and small quantity.These problems have caused significant imbalance in training data,which severely limits the detection performance of the model.In the previous studies,the traffic generated by the DCC tools was used to replace the real malware traffic for training,but it could not describe the complete sample space of existing attacks.At the same time,the previous researchers generally used the traffic generated by DCC tools and few malware samples for testing.It is difficult to completely evaluate the detection capability of the system in this way.In this paper,we propose a new detection method,using self-generated malicious traffic to train the model and using real malware samples to evaluate the model.We summarize the TTPs based on a large number of APT reports,and design a framework for the traffic generation based on the TTPs.We generate the malicious data sets with characteristics of the controllable completeness,the large sample space and the large quantity for training.We can not only generate the attack traffic realistically and scientifically,but also predict the possible future attack traffic.This traffic can be customized and used for model training such as machine learning and deep learning.We extracted 19 features for training,8 of which are newly proposed,and some features are optimized from previous work.We put forward three kinds of new features(totally 8),involving domain readability,domain structure,and IP discreteness.Experiments prove the superiority of the newly proposed and optimized features.Then,we use the machine learning algorithms with strong interpretability for experiments,and select the best model through the results of five-fold cross-validation.Through thoroughly reading and summarizing of the main APT reports,we define the real attack traffic into DCC malware traffic and traffic generated by DCC tools that have been maliciously used by APT organizations.We collect eight real malware traffic samples from various platforms,which cover all threat scenarios and common records.We reproduce traffic generated by three DCC tools that were maliciously used by APT groups.Based on the data sets mentioned above,the detection capability of the system against unknown DCC attacks can be evaluated.The results show that the recall rate of DCC can reach 99.80%,and the false positive rate of hundreds of millions of traffic is 0.29%.
作者
刁嘉文
方滨兴
田志宏
王忠儒
宋首友
王田
崔翔
DIAO Jia-Wen;FANG Bin-Xing;TIAN Zhi-Hong;WANG Zhong-Ru;SONG Shou-You;WANG Tian;CUI Xiang(Key Laboratory of Trustworthy Distributed Computing and Service(Beijing University of Posts and Telecommunications),Ministry of Education,Beijing 100876;Cyberspace Institute of Advanced Technology,Guangzhou University,Guangzhou 510006;Chinese Academy of Cyberspace Studies,Beijing 100010;Beijing DigApis Technology Co.,Ltd,Beijing 100081;DigApis Information Security Technology(Jiangsu)Co.,Ltd,Nantong,Jiangsu 226014)
出处
《计算机学报》
EI
CAS
CSCD
北大核心
2022年第10期2190-2206,共17页
Chinese Journal of Computers
基金
广东省重点领域研发计划项目(2019B010136003,2019B010137004)
国家自然科学基金(U20B2046)
国家重点研发计划(2019YFA0706404)资助.