摘要
从大规模、异构和非结构化的网络空间安全信息文本中识别网络安全实体时,因为实体高频变化和随机性强的特点,导致容易出现实体稀疏的问题,为此提出一种基于语义增强的网络安全实体识别模型。从多维语言学特征增强和语料增强两个方面获取语义增强输入矩阵;利用双向长短记忆网络获取输入矩阵的上下文特征;基于注意力机制对输出的特征生成注意力分配系数,用前馈神经网络聚合和编码来自不同空间的特征;使用条件随机场计算生成最优实体识别序列。实验结果表明,该模型对网络安全实体进行识别,结果显著优于通用领域实体识别模型;与其它网络安全实体识别模型相比较,该模型能得到更好的效果。
To solve the problem of entity sparsity when identifying cybersecurity entities from large-scale,heterogeneous and unstructured cyberspace security information texts due to high-frequency changes and strong randomness,a semantic enhancement-based cybersecurity entity recognition model was proposed.The semantic enhancement input matrix was obtained from both multidimensional linguistic feature enhancement and corpus enhancement.The BiLSTM was used to obtain the contextual features of the fused input matrix.Attention allocation coefficients for the output features were generated based on the attention mechanism and features from different spaces were aggregated and encoded using FFNN.The optimal entity recognition sequence was generated using CRF computation.Experimental results show that the model outperforms the generic domain entity recognition model significantly.Compared with other cybersecurity entity recognition models,the model can get better results.
作者
林宏刚
赵航宇
陈麟
LIN Hong-gang;ZHAO Hang-yu;CHEN Lin(School of Cybersecurity,Chengdu University of Information Technology,Chengdu 610225,China;Sichuan Provincial Key Laboratory of Advanced Cryptography and System Security,Chengdu University of Information Technology,Chengdu 610225,China;Anhui Province Key Lab of Cyberspace Security Situation Awareness and Evaluation,National University of Defense Technology,Hefei 230037,China)
出处
《计算机工程与设计》
北大核心
2024年第9期2584-2590,共7页
Computer Engineering and Design
基金
国家242信息安全计划基金项目(2021-037)
网络空间安全态势感知与评估安徽省重点实验室开放课题基金项目(CSSAE-2021-002)。
关键词
网络安全
网络威胁情报
实体识别
自然语言处理
预训练
语义增强
注意力机制
network security
cyber threat intelligence
entity recognition
natural language processing
pre-training
semantic enhancement
attention mechanism