Lignin is a natural polymer,second only to cellulose in natural reserves.Degradation is one of the ways to achieve the high-value transformation of lignin.Deep eutectic solvent(DES)thermal degradation of lignin can be...Lignin is a natural polymer,second only to cellulose in natural reserves.Degradation is one of the ways to achieve the high-value transformation of lignin.Deep eutectic solvent(DES)thermal degradation of lignin can be used as an excellent green degradation method.This paper introduces the degradation mechanism and effect of the lactic acid-choline chloride DES system in dissolving and degrading alkaline lignin,and the final solvent recovery.It can also be found from the scanning electron microscope(SEM)images that the surface of the degraded solid product is transformed from smooth to disordered.Fourier transform infrared(FTIR)spectroscopy and 1H-NMR spectroscopy were used to characterize the changes in lignin functional groups during DES treatment.The results showed that the content of phenolic hydroxyl groups increased after degradation,indicating that theβ-O-4 ether bond was broken.The molecular weight of the degraded lignin was observed by gel permeation chromatography(GPC),and the lignin residue with low molecular weight and narrow polydispersity index was obtained.The lowest average molecular weight(Mw)reached 2512 g/mol.The ratio of oxygen to carbon atoms in lignin increased substantially during degradation as measured by X-ray photoelectron spectroscopy(XPS),probably because DES treatment was accompanied by many oxidation reactions,which led to significant structural changes in lignin and a large number of ether bond breakage reactions during the reaction.The main final degradation products are aromatic monomers,vanillin,butyrovanillone,etc.展开更多
Attacks such as APT usually hide communication data in massive legitimate network traffic, and mining structurally complex and latent relationships among flow-based network traffic to detect attacks has become the foc...Attacks such as APT usually hide communication data in massive legitimate network traffic, and mining structurally complex and latent relationships among flow-based network traffic to detect attacks has become the focus of many initiatives. Effectively analyzing massive network security data with high dimensions for suspicious flow diagnosis is a huge challenge. In addition, the uneven distribution of network traffic does not fully reflect the differences of class sample features, resulting in the low accuracy of attack detection. To solve these problems, a novel approach called the fuzzy entropy weighted natural nearest neighbor(FEW-NNN) method is proposed to enhance the accuracy and efficiency of flowbased network traffic attack detection. First, the FEW-NNN method uses the Fisher score and deep graph feature learning algorithm to remove unimportant features and reduce the data dimension. Then, according to the proposed natural nearest neighbor searching algorithm(NNN_Searching), the density of data points, each class center and the smallest enclosing sphere radius are determined correspondingly. Finally, a fuzzy entropy weighted KNN classification method based on affinity is proposed, which mainly includes the following three steps: 1、 the feature weights of samples are calculated based on fuzzy entropy values, 2、 the fuzzy memberships of samples are determined based on affinity among samples, and 3、 K-neighbors are selected according to the class-conditional weighted Euclidean distance, the fuzzy membership value of the testing sample is calculated based on the membership of k-neighbors, and then all testing samples are classified according to the fuzzy membership value of the samples belonging to each class;that is, the attack type is determined. The method has been applied to the problem of attack detection and validated based on the famous KDD99 and CICIDS-2017 datasets. From the experimental results shown in this paper, it is observed that the FEW-NNN method improves the accuracy and efficiency of flow-based network traffic attack detection.展开更多
Lignocellulose is the main component of plants and has a wide range of sources.The high-value production of lignocellulose lies in the biorefinery of lignin,cellulose and hemicellulose.The ad-vantages and disadvantage...Lignocellulose is the main component of plants and has a wide range of sources.The high-value production of lignocellulose lies in the biorefinery of lignin,cellulose and hemicellulose.The ad-vantages and disadvantages of traditional lignocellulose pretreatment methods were summarized,and the effective pretreatment parameters were listed.As a green solvent system with excellent performance,deep eutectic solvents(DES)are considered to be the most potential biomass pre-treatment system.Based on this,the new trend and progress of DES in lignocellulose pretreatment were reviewed,focusing on the effects of distinct kinds of lignocellulose raw materials,distinct components of DES,distinct reaction conditions and assisted by microwave ultrasound on the pre-treatment of lignocellulose,and the recyclability of DES solution system was discussed.Finally,the application and development direction of DES in lignocellulose pretreatment are proposed and prospected.展开更多
Network texts have become important carriers of cybersecurity information on the Internet.These texts include the latest security events such as vulnerability exploitations,attack discoveries,advanced persistent threa...Network texts have become important carriers of cybersecurity information on the Internet.These texts include the latest security events such as vulnerability exploitations,attack discoveries,advanced persistent threats,and so on.Extracting cybersecurity entities from these unstructured texts is a critical and fundamental task in many cybersecurity applications.However,most Named Entity Recognition(NER)models are suitable only for general fields,and there has been little research focusing on cybersecurity entity extraction in the security domain.To this end,in this paper,we propose a novel cybersecurity entity identification model based on Bidirectional Long Short-Term Memory with Conditional Random Fields(Bi-LSTM with CRF)to extract security-related concepts and entities from unstructured text.This model,which we have named XBi LSTM-CRF,consists of a word-embedding layer,a bidirectional LSTM layer,and a CRF layer,and concatenates X input with bidirectional LSTM output.Via extensive experiments on an open-source dataset containing an office security bulletin,security blogs,and the Common Vulnerabilities and Exposures list,we demonstrate that XBi LSTM-CRF achieves better cybersecurity entity extraction than state-of-the-art models.展开更多
Command and control(C2)servers are used by attackers to operate communications.To perform attacks,attackers usually employee the Domain Generation Algorithm(DGA),with which to confirm rendezvous points to their C2 ser...Command and control(C2)servers are used by attackers to operate communications.To perform attacks,attackers usually employee the Domain Generation Algorithm(DGA),with which to confirm rendezvous points to their C2 servers by generating various network locations.The detection of DGA domain names is one of the important technologies for command and control communication detection.Considering the randomness of the DGA domain names,recent research in DGA detection applyed machine learning methods based on features extracting and deep learning architectures to classify domain names.However,these methods are insufficient to handle wordlist-based DGA threats,which generate domain names by randomly concatenating dictionary words according to a special set of rules.In this paper,we proposed a a deep learning framework ATT-CNN-BiLSTMfor identifying and detecting DGA domains to alleviate the threat.Firstly,the Convolutional Neural Network(CNN)and bidirectional Long Short-Term Memory(BiLSTM)neural network layer was used to extract the features of the domain sequences information;secondly,the attention layer was used to allocate the corresponding weight of the extracted deep information from the domain names.Finally,the different weights of features in domain names were put into the output layer to complete the tasks of detection and classification.Our extensive experimental results demonstrate the effectiveness of the proposed model,both on regular DGA domains and DGA that hard to detect such as wordlist-based and part-wordlist-based ones.To be precise,we got a F1 score of 98.79%for the detection and macro average precision and recall of 83%for the classification task of DGA domain names.展开更多
TTPs (Tactics, Techniques, and Procedures), which represent an attacker’s goals and methods, are the long period and essential feature of the attacker. Defenders can use TTP intelligence to perform the penetration te...TTPs (Tactics, Techniques, and Procedures), which represent an attacker’s goals and methods, are the long period and essential feature of the attacker. Defenders can use TTP intelligence to perform the penetration test and compensate for defense deficiency. However, most TTP intelligence is described in unstructured threat data, such as APT analysis reports. Manually converting natural language TTPs descriptions to standard TTP names, such as ATT&CK TTP names and IDs, is time-consuming and requires deep expertise. In this paper, we define the TTP classification task as a sentence classification task. We annotate a new sentence-level TTP dataset with 6 categories and 6061 TTP descriptions from 10761 security analysis reports. We construct a threat context-enhanced TTP intelligence mining (TIM) framework to mine TTP intelligence from unstructured threat data. The TIM framework uses TCENet (Threat Context Enhanced Network) to find and classify TTP descriptions, which we define as three continuous sentences, from textual data. Meanwhile, we use the element features of TTP in the descriptions to enhance the TTPs classification accuracy of TCENet. The evaluation result shows that the average classification accuracy of our proposed method on the 6 TTP categories reaches 0.941. The evaluation results also show that adding TTP element features can improve our classification accuracy compared to using only text features. TCENet also achieved the best results compared to the previous document-level TTP classification works and other popular text classification methods, even in the case of few-shot training samples. Finally, the TIM framework organizes TTP descriptions and TTP elements into STIX 2.1 format as final TTP intelligence for sharing the long-period and essential attack behavior characteristics of attackers. In addition, we transform TTP intelligence into sigma detection rules for attack behavior detection. Such TTP intelligence and rules can help defenders deploy long-term effective threat detection and perform more realistic attack simulations to strengthen defense.展开更多
The cybersecurity report provides unstructured actionable cyber threat intelligence(CTI)with detailed threat attack procedures and indicators of compromise(IOCs),e.g.,malware hash or URL(uniform resource locator)of co...The cybersecurity report provides unstructured actionable cyber threat intelligence(CTI)with detailed threat attack procedures and indicators of compromise(IOCs),e.g.,malware hash or URL(uniform resource locator)of command and control server.The actionable CTI,integrated into intrusion detection systems,can not only prioritize the most urgent threats based on the campaign stages of attack vectors(i.e.,IOCs)but also take appropriate mitigation measures based on contextual information of the alerts.However,the dramatic growth in the number of cybersecurity reports makes it nearly impossible for security professionals to find an efficient way to use these massive amounts of threat intelligence.In this paper,we propose a trigger-enhanced actionable CTI discovery system(TriCTI)to portray a relationship between IOCs and campaign stages and generate actionable CTI from cybersecurity reports through natural language processing(NLP)technology.Specifically,we introduce the“campaign trigger”for an effective explanation of the campaign stages to improve the performance of the classification model.The campaign trigger phrases are the keywords in the sentence that imply the campaign stage.The trained final trigger vectors have similar space representations with the keywords in the unseen sentence and will help correct classification by increasing the weight of the keywords.We also meticulously devise a data augmentation specifically for cybersecurity training sets to cope with the challenge of the scarcity of annotation data sets.Compared with state-of-the-art text classification models,such as BERT,the trigger-enhanced classification model has better performance with accuracy(86.99%)and F1 score(87.02%).We run TriCTI on more than 29k cybersecurity reports,from which we automatically and efficiently collect 113,543 actionable CTI.In particular,we verify the actionability of discovered CTI by using large-scale field data from VirusTotal(VT).The results demonstrate that the threat intelligence provided by VT lacks a part of the threat context for IOCs,such as the Actions on Objectives campaign stage.As a comparison,our proposed method can completely identify the actionable CTI in all campaign stages.Accordingly,cyber threats can be identified and resisted at any campaign stage with the discovered actionable CTI.展开更多
Command and control(C2)servers are used by attackers to operate communications.To perform attacks,attackers usually employee the Domain Generation Algorithm(DGA),with which to confirm rendezvous points to their C2 ser...Command and control(C2)servers are used by attackers to operate communications.To perform attacks,attackers usually employee the Domain Generation Algorithm(DGA),with which to confirm rendezvous points to their C2 servers by generating various network locations.The detection of DGA domain names is one of the important technologies for command and control communication detection.Considering the randomness of the DGA domain names,recent research in DGA detection applyed machine learning methods based on features extracting and deep learning architectures to classify domain names.However,these methods are insufficient to handle wordlist-based DGA threats,which generate domain names by randomly concatenating dictionary words according to a special set of rules.In this paper,we proposed a a deep learning framework ATT-CNN-BiLSTMfor identifying and detecting DGA domains to alleviate the threat.Firstly,the Convolutional Neural Network(CNN)and bidirectional Long Short-Term Memory(BiLSTM)neural network layer was used to extract the features of the domain sequences information;secondly,the attention layer was used to allocate the corresponding weight of the extracted deep information from the domain names.Finally,the different weights of features in domain names were put into the output layer to complete the tasks of detection and classification.Our extensive experimental results demonstrate the effectiveness of the proposed model,both on regular DGA domains and DGA that hard to detect such as wordlist-based and part-wordlist-based ones.To be precise,we got a F1 score of 98.79% for the detection and macro average precision and recall of 83% for the classification task of DGA domain names.展开更多
基金This work was financially supported by the National Natural Science Foundation of China(31730106).
文摘Lignin is a natural polymer,second only to cellulose in natural reserves.Degradation is one of the ways to achieve the high-value transformation of lignin.Deep eutectic solvent(DES)thermal degradation of lignin can be used as an excellent green degradation method.This paper introduces the degradation mechanism and effect of the lactic acid-choline chloride DES system in dissolving and degrading alkaline lignin,and the final solvent recovery.It can also be found from the scanning electron microscope(SEM)images that the surface of the degraded solid product is transformed from smooth to disordered.Fourier transform infrared(FTIR)spectroscopy and 1H-NMR spectroscopy were used to characterize the changes in lignin functional groups during DES treatment.The results showed that the content of phenolic hydroxyl groups increased after degradation,indicating that theβ-O-4 ether bond was broken.The molecular weight of the degraded lignin was observed by gel permeation chromatography(GPC),and the lignin residue with low molecular weight and narrow polydispersity index was obtained.The lowest average molecular weight(Mw)reached 2512 g/mol.The ratio of oxygen to carbon atoms in lignin increased substantially during degradation as measured by X-ray photoelectron spectroscopy(XPS),probably because DES treatment was accompanied by many oxidation reactions,which led to significant structural changes in lignin and a large number of ether bond breakage reactions during the reaction.The main final degradation products are aromatic monomers,vanillin,butyrovanillone,etc.
基金the Natural Science Foundation of China (No. 61802404, 61602470)the Strategic Priority Research Program (C) of the Chinese Academy of Sciences (No. XDC02040100)+3 种基金the Fundamental Research Funds for the Central Universities of the China University of Labor Relations (No. 20ZYJS017, 20XYJS003)the Key Research Program of the Beijing Municipal Science & Technology Commission (No. D181100000618003)partially the Key Laboratory of Network Assessment Technology,the Chinese Academy of Sciencesthe Beijing Key Laboratory of Network Security and Protection Technology
文摘Attacks such as APT usually hide communication data in massive legitimate network traffic, and mining structurally complex and latent relationships among flow-based network traffic to detect attacks has become the focus of many initiatives. Effectively analyzing massive network security data with high dimensions for suspicious flow diagnosis is a huge challenge. In addition, the uneven distribution of network traffic does not fully reflect the differences of class sample features, resulting in the low accuracy of attack detection. To solve these problems, a novel approach called the fuzzy entropy weighted natural nearest neighbor(FEW-NNN) method is proposed to enhance the accuracy and efficiency of flowbased network traffic attack detection. First, the FEW-NNN method uses the Fisher score and deep graph feature learning algorithm to remove unimportant features and reduce the data dimension. Then, according to the proposed natural nearest neighbor searching algorithm(NNN_Searching), the density of data points, each class center and the smallest enclosing sphere radius are determined correspondingly. Finally, a fuzzy entropy weighted KNN classification method based on affinity is proposed, which mainly includes the following three steps: 1、 the feature weights of samples are calculated based on fuzzy entropy values, 2、 the fuzzy memberships of samples are determined based on affinity among samples, and 3、 K-neighbors are selected according to the class-conditional weighted Euclidean distance, the fuzzy membership value of the testing sample is calculated based on the membership of k-neighbors, and then all testing samples are classified according to the fuzzy membership value of the samples belonging to each class;that is, the attack type is determined. The method has been applied to the problem of attack detection and validated based on the famous KDD99 and CICIDS-2017 datasets. From the experimental results shown in this paper, it is observed that the FEW-NNN method improves the accuracy and efficiency of flow-based network traffic attack detection.
基金the National Natural Science Foundation of China(No.31730106).
文摘Lignocellulose is the main component of plants and has a wide range of sources.The high-value production of lignocellulose lies in the biorefinery of lignin,cellulose and hemicellulose.The ad-vantages and disadvantages of traditional lignocellulose pretreatment methods were summarized,and the effective pretreatment parameters were listed.As a green solvent system with excellent performance,deep eutectic solvents(DES)are considered to be the most potential biomass pre-treatment system.Based on this,the new trend and progress of DES in lignocellulose pretreatment were reviewed,focusing on the effects of distinct kinds of lignocellulose raw materials,distinct components of DES,distinct reaction conditions and assisted by microwave ultrasound on the pre-treatment of lignocellulose,and the recyclability of DES solution system was discussed.Finally,the application and development direction of DES in lignocellulose pretreatment are proposed and prospected.
基金supported by the National Natural Science Foundation of China(Nos.61702508,61802404,and U1836209)the National Key Research and Development Program of China(Nos.2018YFB0803602 and 2016QY06X1204)+2 种基金the National Social Science Foundation of China(No.19BSH022)supported by the Key Laboratory of Network Assessment Technology,Chinese Academy of SciencesBeijing Key Laboratory of Network Security and Protection Technology。
文摘Network texts have become important carriers of cybersecurity information on the Internet.These texts include the latest security events such as vulnerability exploitations,attack discoveries,advanced persistent threats,and so on.Extracting cybersecurity entities from these unstructured texts is a critical and fundamental task in many cybersecurity applications.However,most Named Entity Recognition(NER)models are suitable only for general fields,and there has been little research focusing on cybersecurity entity extraction in the security domain.To this end,in this paper,we propose a novel cybersecurity entity identification model based on Bidirectional Long Short-Term Memory with Conditional Random Fields(Bi-LSTM with CRF)to extract security-related concepts and entities from unstructured text.This model,which we have named XBi LSTM-CRF,consists of a word-embedding layer,a bidirectional LSTM layer,and a CRF layer,and concatenates X input with bidirectional LSTM output.Via extensive experiments on an open-source dataset containing an office security bulletin,security blogs,and the Common Vulnerabilities and Exposures list,we demonstrate that XBi LSTM-CRF achieves better cybersecurity entity extraction than state-of-the-art models.
基金Our research was supported by the National Key Research and Development Program of China(Grant No.2016YFB0801004)the Strategic Priority Research Program of Chinese Academy of Sciences(Grant No.XDC02030200)the National Key Research and Development Program of China(Grant No.2018YFC0824801).
文摘Command and control(C2)servers are used by attackers to operate communications.To perform attacks,attackers usually employee the Domain Generation Algorithm(DGA),with which to confirm rendezvous points to their C2 servers by generating various network locations.The detection of DGA domain names is one of the important technologies for command and control communication detection.Considering the randomness of the DGA domain names,recent research in DGA detection applyed machine learning methods based on features extracting and deep learning architectures to classify domain names.However,these methods are insufficient to handle wordlist-based DGA threats,which generate domain names by randomly concatenating dictionary words according to a special set of rules.In this paper,we proposed a a deep learning framework ATT-CNN-BiLSTMfor identifying and detecting DGA domains to alleviate the threat.Firstly,the Convolutional Neural Network(CNN)and bidirectional Long Short-Term Memory(BiLSTM)neural network layer was used to extract the features of the domain sequences information;secondly,the attention layer was used to allocate the corresponding weight of the extracted deep information from the domain names.Finally,the different weights of features in domain names were put into the output layer to complete the tasks of detection and classification.Our extensive experimental results demonstrate the effectiveness of the proposed model,both on regular DGA domains and DGA that hard to detect such as wordlist-based and part-wordlist-based ones.To be precise,we got a F1 score of 98.79%for the detection and macro average precision and recall of 83%for the classification task of DGA domain names.
基金Our research was supported by the National Key Research and Development Program of China(Grant No.2018YFC0824801,No.2019QY1302)the National Natural Science Foundation of China(No.61802404).
文摘TTPs (Tactics, Techniques, and Procedures), which represent an attacker’s goals and methods, are the long period and essential feature of the attacker. Defenders can use TTP intelligence to perform the penetration test and compensate for defense deficiency. However, most TTP intelligence is described in unstructured threat data, such as APT analysis reports. Manually converting natural language TTPs descriptions to standard TTP names, such as ATT&CK TTP names and IDs, is time-consuming and requires deep expertise. In this paper, we define the TTP classification task as a sentence classification task. We annotate a new sentence-level TTP dataset with 6 categories and 6061 TTP descriptions from 10761 security analysis reports. We construct a threat context-enhanced TTP intelligence mining (TIM) framework to mine TTP intelligence from unstructured threat data. The TIM framework uses TCENet (Threat Context Enhanced Network) to find and classify TTP descriptions, which we define as three continuous sentences, from textual data. Meanwhile, we use the element features of TTP in the descriptions to enhance the TTPs classification accuracy of TCENet. The evaluation result shows that the average classification accuracy of our proposed method on the 6 TTP categories reaches 0.941. The evaluation results also show that adding TTP element features can improve our classification accuracy compared to using only text features. TCENet also achieved the best results compared to the previous document-level TTP classification works and other popular text classification methods, even in the case of few-shot training samples. Finally, the TIM framework organizes TTP descriptions and TTP elements into STIX 2.1 format as final TTP intelligence for sharing the long-period and essential attack behavior characteristics of attackers. In addition, we transform TTP intelligence into sigma detection rules for attack behavior detection. Such TTP intelligence and rules can help defenders deploy long-term effective threat detection and perform more realistic attack simulations to strengthen defense.
基金Our research was supported by the National Key Research and Development Program of China(Nos.2019QY1301,2018YFB0805005,2018YFC0824801).
文摘The cybersecurity report provides unstructured actionable cyber threat intelligence(CTI)with detailed threat attack procedures and indicators of compromise(IOCs),e.g.,malware hash or URL(uniform resource locator)of command and control server.The actionable CTI,integrated into intrusion detection systems,can not only prioritize the most urgent threats based on the campaign stages of attack vectors(i.e.,IOCs)but also take appropriate mitigation measures based on contextual information of the alerts.However,the dramatic growth in the number of cybersecurity reports makes it nearly impossible for security professionals to find an efficient way to use these massive amounts of threat intelligence.In this paper,we propose a trigger-enhanced actionable CTI discovery system(TriCTI)to portray a relationship between IOCs and campaign stages and generate actionable CTI from cybersecurity reports through natural language processing(NLP)technology.Specifically,we introduce the“campaign trigger”for an effective explanation of the campaign stages to improve the performance of the classification model.The campaign trigger phrases are the keywords in the sentence that imply the campaign stage.The trained final trigger vectors have similar space representations with the keywords in the unseen sentence and will help correct classification by increasing the weight of the keywords.We also meticulously devise a data augmentation specifically for cybersecurity training sets to cope with the challenge of the scarcity of annotation data sets.Compared with state-of-the-art text classification models,such as BERT,the trigger-enhanced classification model has better performance with accuracy(86.99%)and F1 score(87.02%).We run TriCTI on more than 29k cybersecurity reports,from which we automatically and efficiently collect 113,543 actionable CTI.In particular,we verify the actionability of discovered CTI by using large-scale field data from VirusTotal(VT).The results demonstrate that the threat intelligence provided by VT lacks a part of the threat context for IOCs,such as the Actions on Objectives campaign stage.As a comparison,our proposed method can completely identify the actionable CTI in all campaign stages.Accordingly,cyber threats can be identified and resisted at any campaign stage with the discovered actionable CTI.
基金supported by the National Key Research and Development Program of China(Grant No.2016YFB0801004)the Strategic Priority Research Program of Chinese Academy of Sciences(Grant No.XDC02030200)the National Key Research and Development Program of China(Grant No.2018YFC0824801).
文摘Command and control(C2)servers are used by attackers to operate communications.To perform attacks,attackers usually employee the Domain Generation Algorithm(DGA),with which to confirm rendezvous points to their C2 servers by generating various network locations.The detection of DGA domain names is one of the important technologies for command and control communication detection.Considering the randomness of the DGA domain names,recent research in DGA detection applyed machine learning methods based on features extracting and deep learning architectures to classify domain names.However,these methods are insufficient to handle wordlist-based DGA threats,which generate domain names by randomly concatenating dictionary words according to a special set of rules.In this paper,we proposed a a deep learning framework ATT-CNN-BiLSTMfor identifying and detecting DGA domains to alleviate the threat.Firstly,the Convolutional Neural Network(CNN)and bidirectional Long Short-Term Memory(BiLSTM)neural network layer was used to extract the features of the domain sequences information;secondly,the attention layer was used to allocate the corresponding weight of the extracted deep information from the domain names.Finally,the different weights of features in domain names were put into the output layer to complete the tasks of detection and classification.Our extensive experimental results demonstrate the effectiveness of the proposed model,both on regular DGA domains and DGA that hard to detect such as wordlist-based and part-wordlist-based ones.To be precise,we got a F1 score of 98.79% for the detection and macro average precision and recall of 83% for the classification task of DGA domain names.