期刊文献+

MicroAFL:一种云上微服务故障自动定位方法

MicroAFL:Automatic Fault Location for Microservices on Cloud
下载PDF
导出
摘要 随着云上微服务系统规模的不断扩大,微服务之间的依赖关系变得更加紧密复杂,某个微服务的故障可能会通过微服务之间的互相调用传播至其他微服务,进而导致整个微服务系统发生异常。面对依赖关系复杂的微服务系统,考虑到故障的传播性,设计了一种云上微服务故障自动定位方法MicroAFL。首先,MicroAFL实时监测与收集微服务系统运行指标数据,基于自编码器模型对运行指标数据进行分析,判断微服务系统是否存在异常;一旦检测到异常,MicroAFL通过解析云上微服务运行实例之间的通讯数据获取微服务之间的调用关系,进而构建服务调用关系图以刻画故障传播途径;其次,将各个微服务的运行状态与系统资源利用率相关联从而计算服务调用关系图中每个节点的异常权重,并通过改进的加权PageRank算法推断和定位引发异常的故障微服务;最后,在华为云上搭建名为Sock-shop的微服务系统对MicroAFL的故障定位准确性进行评估,实验结果表明MicroAFL的故障定位准确率相较对比方法有所提升。 With the expansion of the scale of microservice system on the cloud,the dependencies between distributed components of microservices become more complex.The fault of a microservice may be propagated to other microservices through the mutual calls of microservices,which will lead to the entire microservice system.With the complex dependencies of microservices system and the propagation of faults,we design MicroAFL,an automatic fault location for microservices on cloud.Firstly,MicroAFL monitors and collects the metric data of the microservice system in real time,analyzes the metric data based on the autoencoder model,and judges whether there is any abnormality in the microservice system.Once an anomaly is detected,MicroAFL obtains the calling relationship between microservices by analyzing the communication data between the running instances of the microservice on the cloud,builds a microservice calling relationship diagram to describe the fault propagation path.Then,the running status of each microservice is associated with the system resource utilization to calculate the anomaly weight of each node in the microservice call graph,and the improved weighted PageRank algorithm is used to infer and locate the faulty microservice that caused the anomaly.Finally,a Sock-shop microservice system was built on Huawei Cloud to evaluate the fault location performance of MicroAFL.The experimental results show that the fault location accuracy of MicroAFL is improved.
作者 羊麟威 李静 饶涵宇 高颖 毛冬 乔宇杰 YANG Lin-wei;LI Jing;RAO Han-yu;GAO Ying;MAO Dong;QIAO Yu-jie(School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China;Information and Communication Branch of State Grid Zhejiang Electric Power Company,Hangzhou 310016,China;Information and Communication Branch of State Grid Corporation,Beijing 100761,China)
出处 《计算机技术与发展》 2023年第5期88-95,共8页 Computer Technology and Development
基金 国家电网有限公司科技项目(5700-202152169A-0-0-00)。
关键词 自编码器 微服务 云环境 故障自动定位 服务调用关系图 故障传播 autoencoder microservice cloud environment automatic fault location service call diagram fault propagation
  • 相关文献

参考文献2

二级参考文献1

共引文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部