With the rapid development of Internet technology,the type of information in the Internet is extremely complex,and a large number of riot contents containing bloody,violent and riotous components have appeared.These c...With the rapid development of Internet technology,the type of information in the Internet is extremely complex,and a large number of riot contents containing bloody,violent and riotous components have appeared.These contents pose a great threat to the network ecology and national security.As a result,the importance of monitoring riotous Internet activity cannot be overstated.Convolutional Neural Network(CNN-based)target detection algorithm has great potential in identifying rioters,so this paper focused on the use of improved backbone and optimization function of You Only Look Once v5(YOLOv5),and further optimization of hyperparameters using genetic algorithm to achieve fine-grained recognition of riot image content.First,the fine-grained features of riot-related images were identified,and then the dataset was constructed by manual annotation.Second,the training and testing work was carried out on the constructed dedicated dataset by supervised deep learning training.The research results have shown that the improved YOLOv5 network significantly improved the fine-grained feature extraction capability of riot-related images compared with the original YOLOv5 network structure,and the mean average precision(mAP)value was improved to 0.6128.Thus,it provided strong support for combating riot-related organizations and maintaining the online ecological environment.展开更多
In recent years,cyber attacks have been intensifying and causing great harm to individuals,companies,and countries.The mining of cyber threat intelligence(CTI)can facilitate intelligence integration and serve well in ...In recent years,cyber attacks have been intensifying and causing great harm to individuals,companies,and countries.The mining of cyber threat intelligence(CTI)can facilitate intelligence integration and serve well in combating cyber attacks.Named Entity Recognition(NER),as a crucial component of text mining,can structure complex CTI text and aid cybersecurity professionals in effectively countering threats.However,current CTI NER research has mainly focused on studying English CTI.In the limited studies conducted on Chinese text,existing models have shown poor performance.To fully utilize the power of Chinese pre-trained language models(PLMs)and conquer the problem of lengthy infrequent English words mixing in the Chinese CTIs,we propose a residual dilated convolutional neural network(RDCNN)with a conditional random field(CRF)based on a robustly optimized bidirectional encoder representation from transformers pre-training approach with whole word masking(RoBERTa-wwm),abbreviated as RoBERTa-wwm-RDCNN-CRF.We are the first to experiment on the relevant open source dataset and achieve an F1-score of 82.35%,which exceeds the common baseline model bidirectional encoder representation from transformers(BERT)-bidirectional long short-term memory(BiLSTM)-CRF in this field by about 19.52%and exceeds the current state-of-the-art model,BERT-RDCNN-CRF,by about 3.53%.In addition,we conducted an ablation study on the encoder part of the model to verify the effectiveness of the proposed model and an in-depth investigation of the PLMs and encoder part of the model to verify the effectiveness of the proposed model.The RoBERTa-wwm-RDCNN-CRF model,the shared pre-processing,and augmentation methods can serve the subsequent fundamental tasks such as cybersecurity information extraction and knowledge graph construction,contributing to important applications in downstream tasks such as intrusion detection and advanced persistent threat(APT)attack detection.展开更多
Telecommunication fraud has run rampant recently worldwide.However,previous studies depend highly on expert knowledge-based feature engineering to extract behavior information,which cannot adapt to the fastchanging mo...Telecommunication fraud has run rampant recently worldwide.However,previous studies depend highly on expert knowledge-based feature engineering to extract behavior information,which cannot adapt to the fastchanging modes of fraudulent subscribers.Therefore,we propose a new taxonomy that needs no hand-designed features but directly takes raw Call DetailRecords(CDR)data as input for the classifier.Concretely,we proposed a fraud detectionmethod using a convolutional neural network(CNN)by taking CDR data as images and applying computer vision techniques like image augmentation.Comprehensive experiments on the real-world dataset from the 2020 Digital Sichuan Innovation Competition show that our proposed method outperforms the classic methods in many metrics with excellent stability in both the changes of quantity and the balance of samples.Compared with the state-of-the-art method,the proposed method has achieved about 89.98%F1-score and 92.93%AUC,improving 2.97%and 0.48%,respectively.With the augmentation technique,the model’s performance can be further enhanced by a 91.09%F1-score and a 94.49%AUC respectively.Beyond telecommunication fraud detection,our method can also be extended to other text datasets to automatically discover new features in the view of computer vision and its powerful methods.展开更多
基金This work was supported by Fundamental Research Funds for the Central Universities,People’s Public Security University of China(2021JKF215)Key Projects of the Technology Research Program of the Ministry of Public Security(2021JSZ09)the Fund for the training of top innovative talents to support master’s degree program,People’s Public Security University of china(2021yjsky018).
文摘With the rapid development of Internet technology,the type of information in the Internet is extremely complex,and a large number of riot contents containing bloody,violent and riotous components have appeared.These contents pose a great threat to the network ecology and national security.As a result,the importance of monitoring riotous Internet activity cannot be overstated.Convolutional Neural Network(CNN-based)target detection algorithm has great potential in identifying rioters,so this paper focused on the use of improved backbone and optimization function of You Only Look Once v5(YOLOv5),and further optimization of hyperparameters using genetic algorithm to achieve fine-grained recognition of riot image content.First,the fine-grained features of riot-related images were identified,and then the dataset was constructed by manual annotation.Second,the training and testing work was carried out on the constructed dedicated dataset by supervised deep learning training.The research results have shown that the improved YOLOv5 network significantly improved the fine-grained feature extraction capability of riot-related images compared with the original YOLOv5 network structure,and the mean average precision(mAP)value was improved to 0.6128.Thus,it provided strong support for combating riot-related organizations and maintaining the online ecological environment.
基金funded by the Double Top-Class Innovation Research Project in Cyberspace Security Enforcement Technology of People’s Public Security University of China(No.2023SYL07).
文摘In recent years,cyber attacks have been intensifying and causing great harm to individuals,companies,and countries.The mining of cyber threat intelligence(CTI)can facilitate intelligence integration and serve well in combating cyber attacks.Named Entity Recognition(NER),as a crucial component of text mining,can structure complex CTI text and aid cybersecurity professionals in effectively countering threats.However,current CTI NER research has mainly focused on studying English CTI.In the limited studies conducted on Chinese text,existing models have shown poor performance.To fully utilize the power of Chinese pre-trained language models(PLMs)and conquer the problem of lengthy infrequent English words mixing in the Chinese CTIs,we propose a residual dilated convolutional neural network(RDCNN)with a conditional random field(CRF)based on a robustly optimized bidirectional encoder representation from transformers pre-training approach with whole word masking(RoBERTa-wwm),abbreviated as RoBERTa-wwm-RDCNN-CRF.We are the first to experiment on the relevant open source dataset and achieve an F1-score of 82.35%,which exceeds the common baseline model bidirectional encoder representation from transformers(BERT)-bidirectional long short-term memory(BiLSTM)-CRF in this field by about 19.52%and exceeds the current state-of-the-art model,BERT-RDCNN-CRF,by about 3.53%.In addition,we conducted an ablation study on the encoder part of the model to verify the effectiveness of the proposed model and an in-depth investigation of the PLMs and encoder part of the model to verify the effectiveness of the proposed model.The RoBERTa-wwm-RDCNN-CRF model,the shared pre-processing,and augmentation methods can serve the subsequent fundamental tasks such as cybersecurity information extraction and knowledge graph construction,contributing to important applications in downstream tasks such as intrusion detection and advanced persistent threat(APT)attack detection.
基金This research was funded by the Double Top-Class Innovation research project in Cyberspace Security Enforcement Technology of People’s Public Security University of China(No.2023SYL07).
文摘Telecommunication fraud has run rampant recently worldwide.However,previous studies depend highly on expert knowledge-based feature engineering to extract behavior information,which cannot adapt to the fastchanging modes of fraudulent subscribers.Therefore,we propose a new taxonomy that needs no hand-designed features but directly takes raw Call DetailRecords(CDR)data as input for the classifier.Concretely,we proposed a fraud detectionmethod using a convolutional neural network(CNN)by taking CDR data as images and applying computer vision techniques like image augmentation.Comprehensive experiments on the real-world dataset from the 2020 Digital Sichuan Innovation Competition show that our proposed method outperforms the classic methods in many metrics with excellent stability in both the changes of quantity and the balance of samples.Compared with the state-of-the-art method,the proposed method has achieved about 89.98%F1-score and 92.93%AUC,improving 2.97%and 0.48%,respectively.With the augmentation technique,the model’s performance can be further enhanced by a 91.09%F1-score and a 94.49%AUC respectively.Beyond telecommunication fraud detection,our method can also be extended to other text datasets to automatically discover new features in the view of computer vision and its powerful methods.