期刊文献+

基于日志数据的分布式软件系统故障诊断综述 被引量:30

Survey of State-of-the-art Log-based Failure Diagnosis
下载PDF
导出
摘要 基于日志数据的故障诊断是指通过智能化手段分析系统运行时产生的日志数据以自动化地发现系统异常、诊断系统故障.随着智能运维(artificial intelligence for IT operations,简称AIOps)的快速发展,该技术正成为学术界和工业界的研究热点.首先总结了基于日志数据的分布式软件系统故障诊断研究框架,然后就日志处理与特征提取、基于日志数据的异常检测、基于日志数据的故障预测和基于日志数据分析的故障根因诊断等关键技术对近年来国内外相关工作进行了深入分析,最后以所提出的研究框架为指导总结相关研究工作,并对未来研究可能面临的挑战进行了展望. Log-based failure diagnosis refers to intelligent analysis of system runtime logs to automatically discover system anomalies and diagnose system failures.Today,this technology is one of the key technologies of artificial intelligence for IT operations(AIOps),which has become a research hotspot in both academia and industry.This study first analyzes the log-based failure diagnosis process,and summarizes the research framework of fault diagnosis based on logs and four key technologies in the field:Log processing and feature extraction technology,anomaly detection technology,failure prediction technology,and fault diagnosis technology.Next,a systematic review is conducted of the achievements of scholars at home and abroad in these four key technical fields in recent years.At last,the different technologies are summarized in this field based on the research framework,and the possible challenges are looked forwarded for future research.
作者 贾统 李影 吴中海 JIA Tong;LI Ying;WU Zhong-Hai(School of Electronics Engineering and Computer Science,Peking University,Beijing 100871,China;National Engineering Research Center for Software Engineering(Peking University),Beijing 100871,China)
出处 《软件学报》 EI CSCD 北大核心 2020年第7期1997-2018,共22页 Journal of Software
基金 广东省重点领域研发计划(2020B010164003)。
关键词 日志数据 异常检测 故障预测 故障根因诊断 log analysis anomaly detection failure prediction fault diagnosis
  • 相关文献

参考文献2

二级参考文献56

  • 1Candea G, Kawamoto S, Fujiki Y et al. Microreboot--A technique for cheap reeovery//Proceedings of the 6th Confer- ence on Symposium on Opearting Systems Design & Imple- mentation-Volume 6. San Francisco, USA, 2004:3.
  • 2Lin T T Y, Siewiorek D P. Error log analysis: Statistical modeling and heuristic trend analysis. IEEE Transactions on Reliability, 1990, 39(4): 419-432.
  • 3Yuan D, Mai H, Xiong W et al. SherLog: Error diagnosis by connecting clues from run-time logs//Proceedings of the 15th Edition of ASPLOS on Architectural Support for Pro- gramming Languages and Operating Systems. Pittsburgh, Pennsylvania, USA, 2010:143-154.
  • 4Zheng A X, Lloyd J, Brewer E. Failure diagnosis using deci- sion trees//Proeeedings of the 1st International Conference on Autonomie Computing. Limassol, Cyprus, 2004:36-43.
  • 5Tan J, Kavulya S, Gandhi R et al. Visual, Log-based causal tracing for performance debugging of MapReduce systems// Proceedings of the 2010 IEEE 30th International Conference on Distributed Computing Systems. Genoa, Italy, 2010: 795-806.
  • 6Zheng Z, Lan Z, Park B H et al. System log pre-processing to improve failure prediction//Proceedings of the IEEE/IFIP International Conference on Dependable Systems & Net- works(DSN'09). Lisbon, Poltugal, 2009:572-577.
  • 7Reidemeister T, Munawar M A, Jiang Met al. Diagnosis of recurrent faults using log files//Proeeedings of the 2009 Con- ference of the Center for Advanced Studies on Collaborative Research. Ontario, Canada, 2009: 12-23.
  • 8Chen M Y, Kiciman E, Fratkin E et al. Pinpoint: Problem determination in large, dynamic internet services//Proceed- ings of the 2002 International Conference on Dependable Sys- tems and Networks. Bethesda, USA, 2002:595-604.
  • 9Barham P, Donnelly A, Isaaes R et al. Using magpie for request extraction and workload modelling//Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation-Volume 6. San Francisco, USA, 2004:18.
  • 10Tan P N, Steinbach M, Kumar V. Introduction to Data Mining. Bostom Pearson Addison Wesley, 2006.

共引文献25

同被引文献207

引证文献30

二级引证文献45

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部