摘要
由于业务和技术复杂度不断提升,系统面临的不可控风险越来越高,线上故障发生的时间和范围无法预测,故障发生后对系统的影响难以评估,这些因素极大制约了线上服务的稳定性和业务的可用性。即使无法保证系统运行在无差错环境中,也要尽量在各种异常情况下保持良好的用户体验。混沌工程通过主动制造不稳定因素,验证和推动系统在面对失控条件时的故障恢复能力,最终实现韧性架构[1]。
Due to the increasing complexity of business and technology,the uncontrollable risks faced by the system are becoming increasingly high.The time and scope of online failures cannot be predicted,and the impact of failures on the system is difficult to evaluate.These factors greatly restrict the stability of online services and the availability of business.Even if it is not possible to ensure that the system operates in an error free environment,it is still necessary to maintain a good user experience in various abnormal situations.Chaos engineering is to actively create unstable factors,verify and promote the system's fault recovery ability in the face of uncontrollable conditions,and ultimately achieve a resilient architecture[1].
作者
卢海波
LU Haibo(Product&Technology Center of Mango TV,Changsha Hunan 410000)
出处
《中国科技纵横》
2024年第6期64-66,共3页
China Science & Technology Overview
关键词
混沌工程
韧性架构
故障恢复
chaos engineering
resilient system architecture
failure recovery