Named Entity Recognition(NER)for cyber security aims to identify and classify cyber security terms from a large number of heterogeneous multisource cyber security texts.In the field of machine learning,deep neural net...Named Entity Recognition(NER)for cyber security aims to identify and classify cyber security terms from a large number of heterogeneous multisource cyber security texts.In the field of machine learning,deep neural networks automatically learn text features from a large number of datasets,but this data-driven method usually lacks the ability to deal with rare entities.Gasmi et al.proposed a deep learning method for named entity recognition in the field of cyber security,and achieved good results,reaching an F1 value of 82.8%.But it is difficult to accurately identify rare entities and complex words in the text.To cope with this challenge,this paper proposes a new model that combines data-driven deep learning methods with knowledge-driven dictionary methods to build dictionary features to assist in rare entity recognition.In addition,based on the data-driven deep learning model,an attentionmechanism is adopted to enrich the local features of the text,better models the context,and improves the recognition effect of complex entities.Experimental results show that our method is better than the baseline model.Our model is more effective in identifying cyber security entities.The Precision,Recall and F1 value reached 90.19%,86.60%and 88.36%respectively.展开更多
With the rapid development of Internet technology and the advent of the era of big data,more and more cyber security texts are provided on the Internet.These texts include not only security concepts,incidents,tools,gu...With the rapid development of Internet technology and the advent of the era of big data,more and more cyber security texts are provided on the Internet.These texts include not only security concepts,incidents,tools,guidelines,and policies,but also risk management approaches,best practices,assurances,technologies,and more.Through the integration of large-scale,heterogeneous,unstructured cyber security information,the identification and classification of cyber security entities can help handle cyber security issues.Due to the complexity and diversity of texts in the cyber security domain,it is difficult to identify security entities in the cyber security domain using the traditional named entity recognition(NER)methods.This paper describes various approaches and techniques for NER in this domain,including the rule-based approach,dictionary-based approach,and machine learning based approach,and discusses the problems faced by NER research in this domain,such as conjunction and disjunction,non-standardized naming convention,abbreviation,and massive nesting.Three future directions of NER in cyber security are proposed:(1)application of unsupervised or semi-supervised technology;(2)development of a more comprehensive cyber security ontology;(3)development of a more comprehensive deep learning model.展开更多
As deep learning models have made remarkable strides in numerous fields,a variety of adversarial attack methods have emerged to interfere with deep learning models.Adversarial examples apply a minute perturbation to t...As deep learning models have made remarkable strides in numerous fields,a variety of adversarial attack methods have emerged to interfere with deep learning models.Adversarial examples apply a minute perturbation to the original image,which is inconceivable to the human but produces a massive error in the deep learning model.Existing attack methods have achieved good results when the network structure is known.However,in the case of unknown network structures,the effectiveness of the attacks still needs to be improved.Therefore,transfer-based attacks are now very popular because of their convenience and practicality,allowing adversarial samples generated on known models to be used in attacks on unknown models.In this paper,we extract sensitive features by Grad-CAM and propose two single-step attacks methods and a multi-step attack method to corrupt sensitive features.In two single-step attacks,one corrupts the features extracted from a single model and the other corrupts the features extracted from multiple models.In multi-step attack,our method improves the existing attack method,thus enhancing the adversarial sample transferability to achieve better results on unknown models.Our method is also validated on CIFAR-10 and MINST,and achieves a 1%-3%improvement in transferability.展开更多
News recommendation system is designed to deal with massive news and provide personalized recommendations for users.Accurately capturing user preferences and modeling news and users is the key to news recommendation.I...News recommendation system is designed to deal with massive news and provide personalized recommendations for users.Accurately capturing user preferences and modeling news and users is the key to news recommendation.In this paper,we propose a new framework,news recommendation system based on topic embedding and knowledge embedding(NRTK).NRTK handle news titles that users have clicked on from two perspectives to obtain news and user representation embedding:1)extracting explicit and latent topic features from news and mining users’preferences for them in historical behaviors;2)extracting entities and propagating users’potential preferences in the knowledge graph.Experiments in a real-world dataset validate the effectiveness and efficiency of our approach.展开更多
Given an undirected graph,the Maximum Clique Problem(MCP)is to find a largest complete subgraph of the graph.MCP is NP-hard and has found many practical applications.In this paper,we propose a parallel Branch-and-Boun...Given an undirected graph,the Maximum Clique Problem(MCP)is to find a largest complete subgraph of the graph.MCP is NP-hard and has found many practical applications.In this paper,we propose a parallel Branch-and-Bound(BnB)algorithm to tackle this NP-hard problem,which carries out multiple bounded searches in parallel.Each search has its upper bound and shares a lower bound with the rest of the searches.The potential benefit of the proposed approach is that an active search terminates as soon as the best lower bound found so far reaches or exceeds its upper bound.We describe the implementation of our highly scalable and efficient parallel MCP algorithm,called PBS,which is based on a state-of-the-art sequential MCP algorithm.The proposed algorithm PBS is evaluated on hard DIMACS and BHOSLIB instances.The results show that PBS achieves a near-linear speedup on most DIMACS instances and a superlinear speedup on most BHOSLIB instances.Finally,we give a detailed analysis that explains the good speedups achieved for the tested instances.展开更多
基金the National Natural Science Foundation of China under Grant No.61862063,61502413,61262025the National Social Science Foundation of China under Grant No.18BJL104+2 种基金the Natural Science Foundation of Key Laboratory of Software Engineering of Yunnan Province under Grant No.2020SE301Yunnan Science and Technology Major Project under Grant No.202002AE090010,202002AD080002-5the Data Driven Software Engineering Innovative Research Team Funding of Yunnan Province under Grant No.2017HC012.
文摘Named Entity Recognition(NER)for cyber security aims to identify and classify cyber security terms from a large number of heterogeneous multisource cyber security texts.In the field of machine learning,deep neural networks automatically learn text features from a large number of datasets,but this data-driven method usually lacks the ability to deal with rare entities.Gasmi et al.proposed a deep learning method for named entity recognition in the field of cyber security,and achieved good results,reaching an F1 value of 82.8%.But it is difficult to accurately identify rare entities and complex words in the text.To cope with this challenge,this paper proposes a new model that combines data-driven deep learning methods with knowledge-driven dictionary methods to build dictionary features to assist in rare entity recognition.In addition,based on the data-driven deep learning model,an attentionmechanism is adopted to enrich the local features of the text,better models the context,and improves the recognition effect of complex entities.Experimental results show that our method is better than the baseline model.Our model is more effective in identifying cyber security entities.The Precision,Recall and F1 value reached 90.19%,86.60%and 88.36%respectively.
基金the National Natural Science Foundation of China(Nos.61862063,61502413,and 61262025)the National Social Science Foundation of China(No.18BJL104)+2 种基金the Natural Science Foundation of Key Laboratory of Software Engineering of Yunnan Province,China(No.2020SE301)the Yunnan Science and Technology Major Project(Nos.202002AE090010 and 202002AD080002-5)the Data Driven Software Engineering Innovative Research Team Funding of Yunnan Province,China(No.2017HC012)。
文摘With the rapid development of Internet technology and the advent of the era of big data,more and more cyber security texts are provided on the Internet.These texts include not only security concepts,incidents,tools,guidelines,and policies,but also risk management approaches,best practices,assurances,technologies,and more.Through the integration of large-scale,heterogeneous,unstructured cyber security information,the identification and classification of cyber security entities can help handle cyber security issues.Due to the complexity and diversity of texts in the cyber security domain,it is difficult to identify security entities in the cyber security domain using the traditional named entity recognition(NER)methods.This paper describes various approaches and techniques for NER in this domain,including the rule-based approach,dictionary-based approach,and machine learning based approach,and discusses the problems faced by NER research in this domain,such as conjunction and disjunction,non-standardized naming convention,abbreviation,and massive nesting.Three future directions of NER in cyber security are proposed:(1)application of unsupervised or semi-supervised technology;(2)development of a more comprehensive cyber security ontology;(3)development of a more comprehensive deep learning model.
基金Supported by the Key R&D Projects in Hubei Province(2022BAA041 and 2021BCA124)the Open Foundation of Engineering Research Center of Cyberspace(KJAQ202112002)。
文摘As deep learning models have made remarkable strides in numerous fields,a variety of adversarial attack methods have emerged to interfere with deep learning models.Adversarial examples apply a minute perturbation to the original image,which is inconceivable to the human but produces a massive error in the deep learning model.Existing attack methods have achieved good results when the network structure is known.However,in the case of unknown network structures,the effectiveness of the attacks still needs to be improved.Therefore,transfer-based attacks are now very popular because of their convenience and practicality,allowing adversarial samples generated on known models to be used in attacks on unknown models.In this paper,we extract sensitive features by Grad-CAM and propose two single-step attacks methods and a multi-step attack method to corrupt sensitive features.In two single-step attacks,one corrupts the features extracted from a single model and the other corrupts the features extracted from multiple models.In multi-step attack,our method improves the existing attack method,thus enhancing the adversarial sample transferability to achieve better results on unknown models.Our method is also validated on CIFAR-10 and MINST,and achieves a 1%-3%improvement in transferability.
基金Supported by the Key Research&Development Projects in Hubei Province(2022BAA041 and 2021BCA124)the Open Foundation of Engineering Research Center of Cyberspace(KJAQ202112002)。
文摘News recommendation system is designed to deal with massive news and provide personalized recommendations for users.Accurately capturing user preferences and modeling news and users is the key to news recommendation.In this paper,we propose a new framework,news recommendation system based on topic embedding and knowledge embedding(NRTK).NRTK handle news titles that users have clicked on from two perspectives to obtain news and user representation embedding:1)extracting explicit and latent topic features from news and mining users’preferences for them in historical behaviors;2)extracting entities and propagating users’potential preferences in the knowledge graph.Experiments in a real-world dataset validate the effectiveness and efficiency of our approach.
基金supported by the National Natural Science Foundation of China under Grant No.62162066the Open Funding of Engineering Research Center of Cyberspace of Ministry of Education of China under Grant No.WLKJAQ202011010+1 种基金the Education Department Funding of Yunnan Province of China under Grant No.2021J0006the Spanish AEI project PID2019-111544GB-C2.
文摘Given an undirected graph,the Maximum Clique Problem(MCP)is to find a largest complete subgraph of the graph.MCP is NP-hard and has found many practical applications.In this paper,we propose a parallel Branch-and-Bound(BnB)algorithm to tackle this NP-hard problem,which carries out multiple bounded searches in parallel.Each search has its upper bound and shares a lower bound with the rest of the searches.The potential benefit of the proposed approach is that an active search terminates as soon as the best lower bound found so far reaches or exceeds its upper bound.We describe the implementation of our highly scalable and efficient parallel MCP algorithm,called PBS,which is based on a state-of-the-art sequential MCP algorithm.The proposed algorithm PBS is evaluated on hard DIMACS and BHOSLIB instances.The results show that PBS achieves a near-linear speedup on most DIMACS instances and a superlinear speedup on most BHOSLIB instances.Finally,we give a detailed analysis that explains the good speedups achieved for the tested instances.