摘要
目前日志异常检测领域存在数据量大、故障和攻击威胁隐蔽性高、传统方法特征工程复杂等困难,研究卷积神经网络(CNN)、循环神经网络等迅速发展的深度学习技术,能够为解决这些问题提供新的思路。提出结合CNN和双向长短时记忆循环神经网络(Bi-LSTM)优势的CNN-BiLSTM深度学习模型,在考虑日志键显著时间序列特征基础上,兼顾日志参数的空间位置特征,通过拼接映射方法进行最大程度避免特征淹没的融合处理。在此基础上,分析模型复杂度,同时在Hadoop日志HDFS数据集上进行实验,对比支持向量机(SVM)、CNN和Bi-LSTM验证CNN-BiLSTM模型的分类效果。分析和实验结果表明,CNN-BiLSTM达到平均91%的日志异常检测准确度,并在WC98_day网络日志数据集上达到94%检测准确度,验证了模型良好的泛化能力,与SVM CNN和Bi-LSTM相比具有更优的检测性能。此外,通过消融实验表明,词嵌入和全连接层结构对于提升模型准确率具有重要作用。
At present,the field of log anomaly detection has difficulties such as large data volume,high concealment of faults and attack threats,and complex feature engineering of traditional methods.The rapid research and development of deep learning provides new ideas for solving these problems.Here we propose to combine Convolutional Neural Network(CNN)and Bi-LSTM. The superior CNN-BiLSTM deep learning model not only considers the significant time series characteristics of the log key,but also takes into account the spatial location characteristics of the log parameters,and uses the splicing mapping method to perform feature fusion processing to avoid mutual inundation to the greatest extent,which is feasible in analyzing model complexity After the performance,based on the Hadoop log HDFS data set,comparing CNN and Bi-LSTM to verify the superior CNN-BiLSTMassification effect of the CNN-BiLSTM model,reaching about 91% log anomaly detection accuracy,and reaching 94% detection accuracy on the WC98_day Web log data set. Verify the good generalization ability of the CNN-BiLSTM model,and finally analyze the importance of word embedding and fully connected layer structure in the CNN-BiLSTM model through ablation experiments.
作者
孙嘉
张建辉
卜佑军
陈博
胡楠
王方玉
SUN Jia;ZHANG Jianhui;BU Youjun;CHEN Bo;HU Nan;WANG Fangyu(Zhong Yuan Network Security Research Institute,Zhengzhou University,Zhengzhou 450001,China;PLA Strategic Support Force Information Engineering University,Zhengzhou 450001,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2022年第7期151-158,167,共9页
Computer Engineering
基金
国家自然科学基金(62176264)
郑州市协同创新重大专项(20XTZX-X010)。
关键词
日志异常检测
深度学习
特征融合
泛化能力
消融实验
log anomaly detection
deep learning
feature fusion
generalization ability
ablation experiment