期刊文献+

基于机器学习的云平台故障排查方法 被引量:4

A Fault Detection Method for Cloud Platform Based on Machine Learning
下载PDF
导出
摘要 随着云计算的发展,越来越多的企业将系统部署在云环境中,大大提高了企业应用服务的灵活性、弹性、扩展性和效率,浙江电网容器云平台是云计算在电力系统的典型应用。然而,云计算的弹性架构也导致企业应用的运维变得更复杂和难以监控,当前运维手段大多缺乏清晰的云上应用访问可见性,给云环境下的故障排查带来了困难。针对这一问题,提出一种基于机器学习的故障排查方法。首先,通过层次聚类方法动态生成节点的网络拓扑结构,实时监测浙江电网容器云平台的各节点性能指标,以此作为特征向量;然后,采用支持向量机和随机搜索方法对其进行故障分类,达到实时排查故障的目的,有效提高了该云平台的性能和可靠性,验证了机器学习方法在电力系统中的应用前景。 With the development of cloud computing,more and more enterprises have deployed their systems into the cloud environment,which greatly improves the flexibility,elasticity,scalability and efficiency of enterprise application services.The container platform of Zhejiang power grid typifies the application of cloud computing in power systems.However,the flexible architecture of cloud computing also makes the operation and maintenance of enterprise applications more complex and harder to monitor.Most current operation and maintenance methods lack clear visibility of application access on the cloud,which brings difficulties to troubleshooting in the cloud environment.This paper proposes a fault detection method based on machine learning.This method firstly dynamically generates network topology structure by a hierarchical clustering approach,monitors the performance metrics of all nodes in the container platform of Zhejiang power grid in real time,and these metrics are regarded as feature vectors.Then,support vector machine(SVM)and random search method are used for fault classification.The method achieves the goal of real-time troubleshooting,effectively improves the reliability and performance of cloud platform and verifies the application prospect of machine learning methods in power system.
作者 王艳艳 张文正 沈佳辉 王亭 李小真 WANG Yanyan;ZHANG Wenzheng;SHEN Jiahui;WANG Ting;LI Xiaozhen(State Grid Zhejiang Electric Power Co.,Ltd.Information&Telecommunication Branch,Hangzhou 310016,China;Zhejiang Huayun Information Technology Co.,Ltd.,Hangzhou 310012,China)
出处 《浙江电力》 2021年第12期124-130,共7页 Zhejiang Electric Power
基金 信通业务综合监控平台实施项目(B311XT200048)。
关键词 机器学习 云计算 支持向量机 平均链接聚类 网络拓扑识别 故障排查 machine learning cloud computing support vector machine average link clustering network topology identification fault detection
  • 相关文献

参考文献11

二级参考文献105

共引文献319

同被引文献61

引证文献4

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部