摘要
日志是安全分析领域的重要数据来源。然而,非结构化原始日志无法直接用于安全分析,因此将日志解析为结构化模板是至关重要的第一步。现有的日志解析方法大多假设属于相同日志模板的日志消息具有相同的日志长度,但日志存在变长变量,导致属于相同模板的日志消息被错误地提取成不同的模板。因此,文章提出一种日志模板自动发现方法KeyParse,首先,基于最长公共子序列算法实现日志与模板的相似度计算,以此忽略变长变量带来的差异性影响,从而实现日志与模板的匹配;其次,基于最高频繁项实现日志模板分组,避免属于相同事件且长度不等的日志消息被划分到不同模板组,减少了模板冗余并提升了模板匹配效率;最后,基于HeavyGuardian算法实现流式日志消息的最高频繁项统计,解决了传统频率统计方法难以适应流式日志消息词频动态变化的问题。实验结果表明,KeyParse在面对多种类型日志集时均具有较高的准确率,平均解析准确度达0.968,并且在解析大型日志集时具有更好的性能。
Log is an important source of data in the field of security analytics.However,unstructured raw log can’t be used directly for security analysis,so parsing log into structured templates is a critical first step.Most of the existing log parsing methods assume that the log messages belonging to the same log template have the same log length,but the log messages belonging to the same template are incorrectly extracted into different templates due to the variable length of the log.Therefore,this paper proposed an automatic log template discovery method,KeyParse,which firstly calculated the similarity between logs and templates based on the longest common subsequence algorithm,so as to ignore the differential influence caused by variables,so as to achieve the matching of logs and templates.Secondly,the log template grouping was realized based on the highest frequency items to avoid the log messages belonging to the same event and different lengths being divided into different template groups,which reduced the template redundancy and improved the template matching efficiency.Finally,the HeavyGuardian algorithm was used to realize the statistics of the highest frequency items of streaming log messages.It solved the problem that the traditional frequency statistics method was difficult to adapt to the dynamic change of the word frequency of streaming log messages.Experimental results show that KeyParse has higher accuracy in the face of various types of log sets,with an average parsing accuracy of 0.968,and has higher performance when parsing large log sets.
作者
张书雅
陈良国
陈兴蜀
ZHANG Shuya;CHEN Liangguo;CHEN Xingshu(School of Cyber Science and Engineering,Sichuan University,Chengdu 610065,China;Key Laboratory of Data Protection and Intelligent Management,Ministry of Education,Chengdu 610065,China;Cyber Science Research Institute,Sichuan University,Chengdu 610065,China)
出处
《信息网络安全》
CSCD
北大核心
2024年第5期767-777,共11页
Netinfo Security
基金
国家自然科学基金[U19A2081]
中央高校基础研究基金[SCU2023D008,2022SCU12116,2023SCU12129,2023SCU12126]
四川大学理工科发展计划[2020SCUNG129]。
关键词
日志解析
模板分组
模板自动发现
log parsing
template grouping
template auto-discovery