Command and control(C2)servers are used by attackers to operate communications.To perform attacks,attackers usually employee the Domain Generation Algorithm(DGA),with which to confirm rendezvous points to their C2 ser...Command and control(C2)servers are used by attackers to operate communications.To perform attacks,attackers usually employee the Domain Generation Algorithm(DGA),with which to confirm rendezvous points to their C2 servers by generating various network locations.The detection of DGA domain names is one of the important technologies for command and control communication detection.Considering the randomness of the DGA domain names,recent research in DGA detection applyed machine learning methods based on features extracting and deep learning architectures to classify domain names.However,these methods are insufficient to handle wordlist-based DGA threats,which generate domain names by randomly concatenating dictionary words according to a special set of rules.In this paper,we proposed a a deep learning framework ATT-CNN-BiLSTMfor identifying and detecting DGA domains to alleviate the threat.Firstly,the Convolutional Neural Network(CNN)and bidirectional Long Short-Term Memory(BiLSTM)neural network layer was used to extract the features of the domain sequences information;secondly,the attention layer was used to allocate the corresponding weight of the extracted deep information from the domain names.Finally,the different weights of features in domain names were put into the output layer to complete the tasks of detection and classification.Our extensive experimental results demonstrate the effectiveness of the proposed model,both on regular DGA domains and DGA that hard to detect such as wordlist-based and part-wordlist-based ones.To be precise,we got a F1 score of 98.79%for the detection and macro average precision and recall of 83%for the classification task of DGA domain names.展开更多
TTPs (Tactics, Techniques, and Procedures), which represent an attacker’s goals and methods, are the long period and essential feature of the attacker. Defenders can use TTP intelligence to perform the penetration te...TTPs (Tactics, Techniques, and Procedures), which represent an attacker’s goals and methods, are the long period and essential feature of the attacker. Defenders can use TTP intelligence to perform the penetration test and compensate for defense deficiency. However, most TTP intelligence is described in unstructured threat data, such as APT analysis reports. Manually converting natural language TTPs descriptions to standard TTP names, such as ATT&CK TTP names and IDs, is time-consuming and requires deep expertise. In this paper, we define the TTP classification task as a sentence classification task. We annotate a new sentence-level TTP dataset with 6 categories and 6061 TTP descriptions from 10761 security analysis reports. We construct a threat context-enhanced TTP intelligence mining (TIM) framework to mine TTP intelligence from unstructured threat data. The TIM framework uses TCENet (Threat Context Enhanced Network) to find and classify TTP descriptions, which we define as three continuous sentences, from textual data. Meanwhile, we use the element features of TTP in the descriptions to enhance the TTPs classification accuracy of TCENet. The evaluation result shows that the average classification accuracy of our proposed method on the 6 TTP categories reaches 0.941. The evaluation results also show that adding TTP element features can improve our classification accuracy compared to using only text features. TCENet also achieved the best results compared to the previous document-level TTP classification works and other popular text classification methods, even in the case of few-shot training samples. Finally, the TIM framework organizes TTP descriptions and TTP elements into STIX 2.1 format as final TTP intelligence for sharing the long-period and essential attack behavior characteristics of attackers. In addition, we transform TTP intelligence into sigma detection rules for attack behavior detection. Such TTP intelligence and rules can help defenders deploy long-term effective threat detection and perform more realistic attack simulations to strengthen defense.展开更多
The cybersecurity report provides unstructured actionable cyber threat intelligence(CTI)with detailed threat attack procedures and indicators of compromise(IOCs),e.g.,malware hash or URL(uniform resource locator)of co...The cybersecurity report provides unstructured actionable cyber threat intelligence(CTI)with detailed threat attack procedures and indicators of compromise(IOCs),e.g.,malware hash or URL(uniform resource locator)of command and control server.The actionable CTI,integrated into intrusion detection systems,can not only prioritize the most urgent threats based on the campaign stages of attack vectors(i.e.,IOCs)but also take appropriate mitigation measures based on contextual information of the alerts.However,the dramatic growth in the number of cybersecurity reports makes it nearly impossible for security professionals to find an efficient way to use these massive amounts of threat intelligence.In this paper,we propose a trigger-enhanced actionable CTI discovery system(TriCTI)to portray a relationship between IOCs and campaign stages and generate actionable CTI from cybersecurity reports through natural language processing(NLP)technology.Specifically,we introduce the“campaign trigger”for an effective explanation of the campaign stages to improve the performance of the classification model.The campaign trigger phrases are the keywords in the sentence that imply the campaign stage.The trained final trigger vectors have similar space representations with the keywords in the unseen sentence and will help correct classification by increasing the weight of the keywords.We also meticulously devise a data augmentation specifically for cybersecurity training sets to cope with the challenge of the scarcity of annotation data sets.Compared with state-of-the-art text classification models,such as BERT,the trigger-enhanced classification model has better performance with accuracy(86.99%)and F1 score(87.02%).We run TriCTI on more than 29k cybersecurity reports,from which we automatically and efficiently collect 113,543 actionable CTI.In particular,we verify the actionability of discovered CTI by using large-scale field data from VirusTotal(VT).The results demonstrate that the threat intelligence provided by VT lacks a part of the threat context for IOCs,such as the Actions on Objectives campaign stage.As a comparison,our proposed method can completely identify the actionable CTI in all campaign stages.Accordingly,cyber threats can be identified and resisted at any campaign stage with the discovered actionable CTI.展开更多
Command and control(C2)servers are used by attackers to operate communications.To perform attacks,attackers usually employee the Domain Generation Algorithm(DGA),with which to confirm rendezvous points to their C2 ser...Command and control(C2)servers are used by attackers to operate communications.To perform attacks,attackers usually employee the Domain Generation Algorithm(DGA),with which to confirm rendezvous points to their C2 servers by generating various network locations.The detection of DGA domain names is one of the important technologies for command and control communication detection.Considering the randomness of the DGA domain names,recent research in DGA detection applyed machine learning methods based on features extracting and deep learning architectures to classify domain names.However,these methods are insufficient to handle wordlist-based DGA threats,which generate domain names by randomly concatenating dictionary words according to a special set of rules.In this paper,we proposed a a deep learning framework ATT-CNN-BiLSTMfor identifying and detecting DGA domains to alleviate the threat.Firstly,the Convolutional Neural Network(CNN)and bidirectional Long Short-Term Memory(BiLSTM)neural network layer was used to extract the features of the domain sequences information;secondly,the attention layer was used to allocate the corresponding weight of the extracted deep information from the domain names.Finally,the different weights of features in domain names were put into the output layer to complete the tasks of detection and classification.Our extensive experimental results demonstrate the effectiveness of the proposed model,both on regular DGA domains and DGA that hard to detect such as wordlist-based and part-wordlist-based ones.To be precise,we got a F1 score of 98.79% for the detection and macro average precision and recall of 83% for the classification task of DGA domain names.展开更多
基金Our research was supported by the National Key Research and Development Program of China(Grant No.2016YFB0801004)the Strategic Priority Research Program of Chinese Academy of Sciences(Grant No.XDC02030200)the National Key Research and Development Program of China(Grant No.2018YFC0824801).
文摘Command and control(C2)servers are used by attackers to operate communications.To perform attacks,attackers usually employee the Domain Generation Algorithm(DGA),with which to confirm rendezvous points to their C2 servers by generating various network locations.The detection of DGA domain names is one of the important technologies for command and control communication detection.Considering the randomness of the DGA domain names,recent research in DGA detection applyed machine learning methods based on features extracting and deep learning architectures to classify domain names.However,these methods are insufficient to handle wordlist-based DGA threats,which generate domain names by randomly concatenating dictionary words according to a special set of rules.In this paper,we proposed a a deep learning framework ATT-CNN-BiLSTMfor identifying and detecting DGA domains to alleviate the threat.Firstly,the Convolutional Neural Network(CNN)and bidirectional Long Short-Term Memory(BiLSTM)neural network layer was used to extract the features of the domain sequences information;secondly,the attention layer was used to allocate the corresponding weight of the extracted deep information from the domain names.Finally,the different weights of features in domain names were put into the output layer to complete the tasks of detection and classification.Our extensive experimental results demonstrate the effectiveness of the proposed model,both on regular DGA domains and DGA that hard to detect such as wordlist-based and part-wordlist-based ones.To be precise,we got a F1 score of 98.79%for the detection and macro average precision and recall of 83%for the classification task of DGA domain names.
基金Our research was supported by the National Key Research and Development Program of China(Grant No.2018YFC0824801,No.2019QY1302)the National Natural Science Foundation of China(No.61802404).
文摘TTPs (Tactics, Techniques, and Procedures), which represent an attacker’s goals and methods, are the long period and essential feature of the attacker. Defenders can use TTP intelligence to perform the penetration test and compensate for defense deficiency. However, most TTP intelligence is described in unstructured threat data, such as APT analysis reports. Manually converting natural language TTPs descriptions to standard TTP names, such as ATT&CK TTP names and IDs, is time-consuming and requires deep expertise. In this paper, we define the TTP classification task as a sentence classification task. We annotate a new sentence-level TTP dataset with 6 categories and 6061 TTP descriptions from 10761 security analysis reports. We construct a threat context-enhanced TTP intelligence mining (TIM) framework to mine TTP intelligence from unstructured threat data. The TIM framework uses TCENet (Threat Context Enhanced Network) to find and classify TTP descriptions, which we define as three continuous sentences, from textual data. Meanwhile, we use the element features of TTP in the descriptions to enhance the TTPs classification accuracy of TCENet. The evaluation result shows that the average classification accuracy of our proposed method on the 6 TTP categories reaches 0.941. The evaluation results also show that adding TTP element features can improve our classification accuracy compared to using only text features. TCENet also achieved the best results compared to the previous document-level TTP classification works and other popular text classification methods, even in the case of few-shot training samples. Finally, the TIM framework organizes TTP descriptions and TTP elements into STIX 2.1 format as final TTP intelligence for sharing the long-period and essential attack behavior characteristics of attackers. In addition, we transform TTP intelligence into sigma detection rules for attack behavior detection. Such TTP intelligence and rules can help defenders deploy long-term effective threat detection and perform more realistic attack simulations to strengthen defense.
基金Our research was supported by the National Key Research and Development Program of China(Nos.2019QY1301,2018YFB0805005,2018YFC0824801).
文摘The cybersecurity report provides unstructured actionable cyber threat intelligence(CTI)with detailed threat attack procedures and indicators of compromise(IOCs),e.g.,malware hash or URL(uniform resource locator)of command and control server.The actionable CTI,integrated into intrusion detection systems,can not only prioritize the most urgent threats based on the campaign stages of attack vectors(i.e.,IOCs)but also take appropriate mitigation measures based on contextual information of the alerts.However,the dramatic growth in the number of cybersecurity reports makes it nearly impossible for security professionals to find an efficient way to use these massive amounts of threat intelligence.In this paper,we propose a trigger-enhanced actionable CTI discovery system(TriCTI)to portray a relationship between IOCs and campaign stages and generate actionable CTI from cybersecurity reports through natural language processing(NLP)technology.Specifically,we introduce the“campaign trigger”for an effective explanation of the campaign stages to improve the performance of the classification model.The campaign trigger phrases are the keywords in the sentence that imply the campaign stage.The trained final trigger vectors have similar space representations with the keywords in the unseen sentence and will help correct classification by increasing the weight of the keywords.We also meticulously devise a data augmentation specifically for cybersecurity training sets to cope with the challenge of the scarcity of annotation data sets.Compared with state-of-the-art text classification models,such as BERT,the trigger-enhanced classification model has better performance with accuracy(86.99%)and F1 score(87.02%).We run TriCTI on more than 29k cybersecurity reports,from which we automatically and efficiently collect 113,543 actionable CTI.In particular,we verify the actionability of discovered CTI by using large-scale field data from VirusTotal(VT).The results demonstrate that the threat intelligence provided by VT lacks a part of the threat context for IOCs,such as the Actions on Objectives campaign stage.As a comparison,our proposed method can completely identify the actionable CTI in all campaign stages.Accordingly,cyber threats can be identified and resisted at any campaign stage with the discovered actionable CTI.
基金supported by the National Key Research and Development Program of China(Grant No.2016YFB0801004)the Strategic Priority Research Program of Chinese Academy of Sciences(Grant No.XDC02030200)the National Key Research and Development Program of China(Grant No.2018YFC0824801).
文摘Command and control(C2)servers are used by attackers to operate communications.To perform attacks,attackers usually employee the Domain Generation Algorithm(DGA),with which to confirm rendezvous points to their C2 servers by generating various network locations.The detection of DGA domain names is one of the important technologies for command and control communication detection.Considering the randomness of the DGA domain names,recent research in DGA detection applyed machine learning methods based on features extracting and deep learning architectures to classify domain names.However,these methods are insufficient to handle wordlist-based DGA threats,which generate domain names by randomly concatenating dictionary words according to a special set of rules.In this paper,we proposed a a deep learning framework ATT-CNN-BiLSTMfor identifying and detecting DGA domains to alleviate the threat.Firstly,the Convolutional Neural Network(CNN)and bidirectional Long Short-Term Memory(BiLSTM)neural network layer was used to extract the features of the domain sequences information;secondly,the attention layer was used to allocate the corresponding weight of the extracted deep information from the domain names.Finally,the different weights of features in domain names were put into the output layer to complete the tasks of detection and classification.Our extensive experimental results demonstrate the effectiveness of the proposed model,both on regular DGA domains and DGA that hard to detect such as wordlist-based and part-wordlist-based ones.To be precise,we got a F1 score of 98.79% for the detection and macro average precision and recall of 83% for the classification task of DGA domain names.