摘要
目前高性能计算系统规模和复杂性不断增加,应用软件作业性能异常的原因变得更加复杂多样,传统的针对基于监控数据进行人工分析的方法存在效率低下和过分依赖分析人员经验的问题。提出一种基于长短期记忆网络(LSTM)的性能异常检测方法。以天气预报模式WRF为研究对象,首先从历史作业数据中学习出正常性能数据的变化情况,然后通过引入boxplot方法对LSTM模型预测值与实际观测值之间的残差进行统计分析,并将大于下四分位的数据判定为异常,从而实现应用软件作业性能异常的检测。实验结果表明,上述方法不仅可以较好地检测出性能的异常,而且能适用于多种不同类型的数据集。
At present,the scale and complexity of high-performance computing systems are constantly increasing,and the reasons for abnormal job performance of application software have become more complex and diverse.Traditional methods for manual analysis based on monitoring data have problems of low efficiency and excessive reliance on the experience of analysts.This Propose a performance anomaly detection method based on Long Short Term Memory Network(LSTM).Taking the weather forecast model WRF as the research object,we first learn the changes in normal performance data from historical homework data,and then introduce the boxplot method to statistically analyze the residual between the predicted values of the LSTM model and the actual observed values.Data larger than the lower quartile is judged as abnormal,thus achieving the detection of abnormal performance in application software homework.The experimental results show that this method can not only effectively alleviate the shortage of manual method,but also can be applied to various types of data sets.
作者
朱林青
张涛
吕灼恒
孙建鹏
ZHU Lin-Qing;ZHANG Tao;LV Zhuo-heng;SUN Jian-peng(College of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou Henan 450001,China)
出处
《计算机仿真》
2024年第5期536-542,共7页
Computer Simulation
基金
国家重点研发计划(2021YFB0300200)。