摘要
针对高级可持续威胁(APT)分析报告未被有效利用,缺乏自动化方法生成结构化知识并形成黑客组织特征画像问题,提出一种融合实体识别和实体对齐的APT攻击知识自动抽取方法。首先,结合APT攻击特点设计12种实体类别;其次,构建融合Bert、双向长短期记忆(BiLSTM)网络和条件随机场(CRF)的APT攻击实体识别模型,利用Bert预训练标注语料,BiLSTM学习上下文语义信息,注意力机制突出关键特征,再由CRF识别实体;最后,结合实体对齐方法来生成不同APT组织的结构化知识。实验结果表明,所提方法能有效识别APT攻击实体,其精确率、召回率和F1值分别为0.9296、0.8733和0.9006,均优于现有模型。此外,所提方法能在少量样本标注的情况下自动抽取高级可持续威胁知识,通过实体对齐能生成常见APT组织的结构化特征画像,从而为后续APT攻击知识图谱构建和攻击溯源提供支撑。
Aiming at the problems that APT(advanced persistent threat)analysis reports have not been fully utilized,and there is a lack of automation methods to generate structured knowledge and construct feature portraits of the hacker organizations,an automatic knowledge extraction method of APT attacks combining entity recognition and entity alignment was proposed.Firstly,12 entity categories were designed according to the characteristics of APT attacks.Then,an APT attack entity recognition method that combined Bert,BiLSTM(bidirectional long and short-term memory)network,and CRF(conditional random field)was proposed.The Bert model was used to pre-train the annotated corpus.The BiLSTM model was constructed to learn contextual semantic information.The attention mechanism was built to extract key features.Moreover,the CRF algorithm was proposed to identify entities.Finally,the entity alignment method was designed to generate structured knowledge of different APT organizations.Experimental results show that the proposed method can effectively identify APT attack entities,with a precision of 0.9296,a recall of 0.8733,and an F1-score of 0.9006,superior to existing models.In addition,the proposed method can automatically extract advanced persistent threat knowledge with a small number of annotated samples and generate the structured portraits of APT groups through entity alignment,thus providing support for subsequent knowledge graph construction of APT attacks and attack tracing.
作者
杨秀璋
彭国军
李子川
吕杨琦
刘思德
李晨光
YANG Xiuzhang;PENG Guojun;LI Zichuan;LYU Yangqi;LIU Side;LI Chenguang(Key Laboratory of Aerospace Information Security and Trusted Computing of Ministry of Education,Wuhan University,Wuhan 430072,China;School of Cyber Science and Engineering,Wuhan University,Wuhan 430072,China)
出处
《通信学报》
EI
CSCD
北大核心
2022年第6期58-70,共13页
Journal on Communications
基金
国家自然科学基金资助项目(No.62172308,No.U1626107,No.61972297,No.62172144)。
关键词
高级可持续威胁
威胁情报抽取
实体识别
实体对齐
深度学习
advanced persistent threat
threat intelligence extraction
entity recognition
entity alignment
deep learning