摘要
基于日志数据的故障诊断是指通过智能化手段分析系统运行时产生的日志数据以自动化地发现系统异常、诊断系统故障.随着智能运维(artificial intelligence for IT operations,简称AIOps)的快速发展,该技术正成为学术界和工业界的研究热点.首先总结了基于日志数据的分布式软件系统故障诊断研究框架,然后就日志处理与特征提取、基于日志数据的异常检测、基于日志数据的故障预测和基于日志数据分析的故障根因诊断等关键技术对近年来国内外相关工作进行了深入分析,最后以所提出的研究框架为指导总结相关研究工作,并对未来研究可能面临的挑战进行了展望.
Log-based failure diagnosis refers to intelligent analysis of system runtime logs to automatically discover system anomalies and diagnose system failures.Today,this technology is one of the key technologies of artificial intelligence for IT operations(AIOps),which has become a research hotspot in both academia and industry.This study first analyzes the log-based failure diagnosis process,and summarizes the research framework of fault diagnosis based on logs and four key technologies in the field:Log processing and feature extraction technology,anomaly detection technology,failure prediction technology,and fault diagnosis technology.Next,a systematic review is conducted of the achievements of scholars at home and abroad in these four key technical fields in recent years.At last,the different technologies are summarized in this field based on the research framework,and the possible challenges are looked forwarded for future research.
作者
贾统
李影
吴中海
JIA Tong;LI Ying;WU Zhong-Hai(School of Electronics Engineering and Computer Science,Peking University,Beijing 100871,China;National Engineering Research Center for Software Engineering(Peking University),Beijing 100871,China)
出处
《软件学报》
EI
CSCD
北大核心
2020年第7期1997-2018,共22页
Journal of Software
基金
广东省重点领域研发计划(2020B010164003)。
关键词
日志数据
异常检测
故障预测
故障根因诊断
log analysis
anomaly detection
failure prediction
fault diagnosis