As cloud system architectures evolve continuously,the interac-tions among distributed components in various roles become increasingly complex.This complexity makes it difficult to detect anomalies in cloud systems.The...As cloud system architectures evolve continuously,the interac-tions among distributed components in various roles become increasingly complex.This complexity makes it difficult to detect anomalies in cloud systems.The system status can no longer be determined through individual key performance indicators(KPIs)but through joint judgments based on syn-ergistic relationships among distributed components.Furthermore,anomalies in modern cloud systems are usually not sudden crashes but rather grad-ual,chronic,localized failures or quality degradations in a weakly available state.Therefore,accurately modeling cloud systems and mining the hidden system state is crucial.To address this challenge,we propose an anomaly detection method with dynamic spatiotemporal learning(AD-DSTL).AD-DSTL leverages the spatiotemporal dynamics of the system to train an end-to-end deep learning model driven by data from system monitoring to detect underlying anomalous states in complex cloud systems.Unlike previous work that focuses on the KPIs of separate components,AD-DSTL builds a model for the entire system and characterizes its spatiotemporal dynamics based on graph convolutional networks(GCN)and long short-term memory(LSTM).We validated AD-DSTL using four datasets from different backgrounds,and it demonstrated superior robustness compared to other baseline algorithms.Moreover,when raising the target exception level,both the recall and precision of AD-DSTL reached approximately 0.9.Our experimental results demon-strate that AD-DSTL can meet the requirements of anomaly detection for complex cloud systems.展开更多
In order to obtain information or discover knowledge from system logs,the first step is to performlog parsing,whereby unstructured raw logs can be transformed into a sequence of structured events.Although comprehensiv...In order to obtain information or discover knowledge from system logs,the first step is to performlog parsing,whereby unstructured raw logs can be transformed into a sequence of structured events.Although comprehensive studies on log parsing have been conducted in recent years,most assume that one event object corresponds to a single-line message.However,in a growing number of scenarios,one event object spans multiple lines in the log,for which parsing methods toward single-line events are not applicable.In order to address this problem,this paper proposes an automated log parsing method for multiline events(LPME).LPME finds multiline event objects via iterative scanning,driven by a set of heuristic rules derived from practice.The advantage of LPME is that it proposes a cohesion-based evaluation method for multiline events and a bottom-up search approach that eliminates the process of enumerating all combinations.We analyze the algorithmic complexity of LPME and validate it on four datasets from different backgrounds.Evaluations show that the actual time complexity of LPME parsing for multiline events is close to the constant time,which enables it to handle large-scale sample inputs.On the experimental datasets,the performance of LPME achieves 1.0 for recall,and the precision is generally higher than 0.9,which demonstrates the effectiveness of the proposed LPME.展开更多
基金supported by the National Key Research and Development Program of China (2022YFB4500800).
文摘As cloud system architectures evolve continuously,the interac-tions among distributed components in various roles become increasingly complex.This complexity makes it difficult to detect anomalies in cloud systems.The system status can no longer be determined through individual key performance indicators(KPIs)but through joint judgments based on syn-ergistic relationships among distributed components.Furthermore,anomalies in modern cloud systems are usually not sudden crashes but rather grad-ual,chronic,localized failures or quality degradations in a weakly available state.Therefore,accurately modeling cloud systems and mining the hidden system state is crucial.To address this challenge,we propose an anomaly detection method with dynamic spatiotemporal learning(AD-DSTL).AD-DSTL leverages the spatiotemporal dynamics of the system to train an end-to-end deep learning model driven by data from system monitoring to detect underlying anomalous states in complex cloud systems.Unlike previous work that focuses on the KPIs of separate components,AD-DSTL builds a model for the entire system and characterizes its spatiotemporal dynamics based on graph convolutional networks(GCN)and long short-term memory(LSTM).We validated AD-DSTL using four datasets from different backgrounds,and it demonstrated superior robustness compared to other baseline algorithms.Moreover,when raising the target exception level,both the recall and precision of AD-DSTL reached approximately 0.9.Our experimental results demon-strate that AD-DSTL can meet the requirements of anomaly detection for complex cloud systems.
文摘In order to obtain information or discover knowledge from system logs,the first step is to performlog parsing,whereby unstructured raw logs can be transformed into a sequence of structured events.Although comprehensive studies on log parsing have been conducted in recent years,most assume that one event object corresponds to a single-line message.However,in a growing number of scenarios,one event object spans multiple lines in the log,for which parsing methods toward single-line events are not applicable.In order to address this problem,this paper proposes an automated log parsing method for multiline events(LPME).LPME finds multiline event objects via iterative scanning,driven by a set of heuristic rules derived from practice.The advantage of LPME is that it proposes a cohesion-based evaluation method for multiline events and a bottom-up search approach that eliminates the process of enumerating all combinations.We analyze the algorithmic complexity of LPME and validate it on four datasets from different backgrounds.Evaluations show that the actual time complexity of LPME parsing for multiline events is close to the constant time,which enables it to handle large-scale sample inputs.On the experimental datasets,the performance of LPME achieves 1.0 for recall,and the precision is generally higher than 0.9,which demonstrates the effectiveness of the proposed LPME.