摘要
在当今大规模产业数字化转型的时代,云原生架构与微服务技术的结合已经成为转型的核心竞争力。这种开发模式提高了软件开发、部署和测试流程的完整性与灵活性。然而,随着互联网的发展,微服务架构下Trace数据的复杂性和时序问题导致异常检测准确率较低、根因定位较慢。针对这些挑战,文章提出了一种基于时序的多维度指标异常检测算法。该算法将多维度指标与时序异常检测结合,显著提高了异常检测的准确率。通过改良服务Trace度量向量,该算法解决了在物理资源充足的情况下异常检测准确性较低的问题,并通过时序检测进一步克服传统异常检测方法的局限。此外,文章还提出了一种基于“链路-操作”图与上下文结合的根因定位算法。该算法通过深入分析历史Trace数据中服务间的依赖关系,有效提高了根因定位的准确性。该算法将结构相似的Trace图融合,不仅节省了大量的构图时间,而且提高了根因定位的效率和精度。实验结果表明,与传统方法相比,本文所提的方法能更快、更准确地识别并定位异常根因。
In the current era of large-scale industrial digital transformation,the integration of cloud-native architecture with microservices technology has become the core competitive advantage of transformation.This development model improves the integrity and flexibility of the software development,deployment,and testing processes.However,with the development of the Internet,the complexity of Trace data and timing issues in a microservices architecture have led to lower accuracy in anomaly detection and slower root cause localization.In response to these challenges,this paper initially proposed a time-based,multi-dimensional metric anomaly detection algorithm.This algorithm combined multi-dimensional metrics with time series anomaly detection to significantly increase the accuracy of anomaly detection.By improving the Service Trace Metric Vector,it addressed the lower accuracy issues in anomaly detection when physical resources were sufficient and overcomes the limitations of traditional anomaly detection methods through time series detection.Additionally,this paper proposed a root cause localization algorithm based on a “link-operation” graph combined with context.This algorithm effectively improved the accuracy of root cause localization by deeply analyzing the dependency relationships between services in historical Trace data.The algorithm merged structurally similar Trace graphs,not only saving a considerable amount of time in graph construction but also enhancing the efficiency and precision of root cause localization.Experiments results indicate that the methods proposed in this paper can identify and localize the root causes of anomalies more quickly and accurately compared to traditional methods.
作者
周书丞
李杨
李传荣
郭璐璐
贾辛洪
杨兴华
ZHOU Shucheng;LI Yang;LI Chuanrong;GUO Lulu;JIA Xinhong;YANG Xinghua(Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100085,China;School of Cyber Security,University of Chinese Academy of Sciences,Beijing 100049,China;Key Laboratory of Cyberspace Security Defense,Beijing 100085,China)
出处
《信息网络安全》
CSCD
北大核心
2024年第7期1062-1075,共14页
Netinfo Security
基金
国家自然科学基金[62372450]。