摘要
日志能记录系统运行时的具体状态,而自动化的日志异常检测对网络安全至关重要。针对日志语句随时间演变导致异常检测准确率低的问题,提出一种无监督日志异常检测模型LogCL。首先,通过日志解析技术将半结构化的日志数据转换为结构化的日志模板;其次,使用会话和固定窗口将日志事件划分为日志序列;再次,提取日志序列的数量特征,使用自然语言处理技术对日志模板进行语义特征提取,并利用词频-词语逆频率(TF-IWF)算法生成加权的句嵌入向量;最后,将特征向量输入一个并列的基于卷积神经网络(CNN)和双向长短期记忆(Bi-LSTM)网络的模型中进行检测。在两个公开的真实数据集上的实验结果表明,所提模型较基准模型LogAnomaly在异常检测的F1-score上分别提高了3.6和2.3个百分点。因此LogCL能够对日志数据进行有效的异常检测。
Logs can record the specific status of the system during the operation,and automated log anomaly detection is critical to network security.Concerning the problem of low accuracy in anomaly detection caused by the evolution of log sentences over time,an unsupervised log anomaly detection model LogCL was proposed.Firstly,the log parsing technique was used to convert semi-structured log data into structured log templates.Secondly,the sessions and fixed windows were employed to divide log events into log sequences.Thirdly,quantitative characteristics of the log sequences were extracted,natural language processing technique was used to extract semantic features of log templates,and Term Frequency-Inverse Word Frequency(TF-IWF)algorithm was utilized to generate weighted sentence embedding vectors.Finally,the feature vectors were input into a parallel model based on Convolutional Neural Network(CNN)and Bi-directional Long Short-Term Memory(Bi-LSTM)network for detection.Experimental results on two public real datasets show that the proposed model improves the anomaly detection F1-score by 3.6 and 2.3 percentage points respectively compared with the baseline model LogAnomaly.Therefore,LogCL can perform effectively on log anomaly detection.
作者
尹春勇
张杨春
YIN Chunyong;ZHANG Yangchun(School of Computer Science,Nanjing University of Information Science and Technology,Nanjing Jiangsu 210044,China)
出处
《计算机应用》
CSCD
北大核心
2023年第11期3510-3516,共7页
journal of Computer Applications
关键词
异常检测
深度学习
日志分析
词嵌入
卷积神经网络
双向长短期记忆网络
anomaly detection
deep learning
log analysis
word embedding
Convolutional Neural Network(CNN)
Bi-directional Long Short-Term Memory(Bi-LSTM)network