摘要
LARGE框架是部署在中国科学院超级计算环境中的日志分析系统,通过日志收集、集中分析、结果反馈等步骤对环境中的各种日志文件进行监控和分析。在对环境中系统日志的监控过程中,系统维护人员需要通过日志模式提炼算法将大量的过往系统日志记录缩减为少量的日志模式集合。然而随着日志规模的增长以及messages日志文件的特殊性,原有的日志模式提炼算法已经难以满足对大规模日志快速处理的需要。介绍了一种对于日志模式提炼算法的优化方法,通过引入MapReduce机制实现在存在多个日志输入文件的情况下对日志处理和模式提炼的流程进行加速。实验表明,当输入文件较多时,该优化方法能够显著提高词汇一致率算法的运行速度,大幅减少运行时间。此外,还对使用词汇转换函数时的算法运行时间和提炼效果进行了验证。
The LARGE system is a log analysis framework deployed in the supercomputing environment in Chinese Academy of Sciences. It monitors and analyzes various log files in the environment through log collection, centrally analysis and result feedback. In the process of monitoring system logs, it is necessary for system maintenance personnel to reduce the large number of original logs into a small set of log patterns using the log pattern extraction algorithm. However, because of the fast increase of log size and the peculiarity of messages log files, the traditional log pattern extraction algorithm fails to satisfy the requirement of rapid processing of logs. We propose an optimization method for the log pattern extraction algorithm by introducing the idea of the MapReduce mechanism to accelerate the process of log pattern extraction in case of multiple input log files. Evaluation results show that when there are a number of input files, the optimization method can significantly improve the running speed of the vocabulary consistency algorithm and greatly reduce the running time. We also evaluate the time cost and the extraction effect the optimization algorithm when the vocabulary conversion function is used.
出处
《计算机工程与科学》
CSCD
北大核心
2017年第5期821-828,共8页
Computer Engineering & Science
基金
国家重点研发计划项目(2016YFB0201404)
十二五863重大项目(2014AA01A302)