摘要
随着云计算技术的进一步发展,越来越多的应用系统托管在云计算平台上,这就对构成云计算平台的众多分布式系统的可靠性提出了更高的要求。传统分析方法难以在系统规模较大时对可修分布式系统做可靠性分析。为了提高服务质量以及降低因违反服务水平协议而导致的经济损失,本文基于马尔可夫模型提出一种适用于可修分布式系统的可靠性分析方法。通过简化系统的状态空间,在系统运行期间对其软硬件状态进行采样,并通过对分布式系统的失效过程和修复过程进行分析,根据给定时间内的失效概率序列、修复概率序列计算分布式系统的节点状态转移矩阵,得出该马尔可夫矩阵对应的稳态向量。根据特定分布式系统的自身特性,对该稳态向量进一步分析,得出系统最终的可靠性衡量指标。最后通过实验验证了该方法的可用性和有效性。
With the further development of cloud computing technology,more and more application systems are hosted on cloud computing platforms,which puts forward higher requirements for the reliability of the many distributed systems that make up a cloud computing platform.It is difficult for traditional analysis methods to analyze the reliability of repairable distributed system when the system scale is large and dynamic.In order to improve service quality and reduce economic losses caused by violation of service level agreements,this paper proposes a reliability analysis method for repairable distributed systems based on Markov models.By simplifying the state space of the system,the software and hardware states are sampled during the system operation,and the failure process and repair process of the distributed system are analyzed.According to the failure probability sequence and repair probability sequence in a given time,the node state transition matrix of the distributed system is calculated,and the steady-state vector corresponding to the Markov matrix is obtained.Then according to the characteristics of the distributed system,the steady-state vector is further analyzed to obtain the final reliability measurement index of the system.Finally,the validity and effectiveness of the method are verified by experiments.
作者
杨牧川
吕晓丹
蒋朝惠
YANG Mu-chuan;LYU Xiao-dan;JIANG Chao-hui(College of Computer Science and Technology, Guizhou University, Guiyang 550025, China;Guizhou Provincial Key Laboratory of Public Big Data, Guiyang 550025, China)
出处
《计算机与现代化》
2020年第6期28-33,51,共7页
Computer and Modernization
基金
贵州省科技计划资助项目(黔科合基础[2017]1051)。