摘要
自律计算是分布式异构环境下进行资源自动化管理的有效技术。其目的是通过系统的自我监视,主动发现硬件故障和软件故障,并采用策略技术加以修复,完成系统的自我管理。因此,故障监视是自律计算中较为重要的一个研究方向,但目前尚欠缺有效可行的方法来完成自律系统的故障监视。提出了一种分布式异构环境下基于事件分类的方法来设计自律计算系统故障监视机制,以统一监视管理异构资源的故障,并与自律系统互联通报,激活相应的策略来修复故障,为自律系统的自我修复提供依据。
Autonomic Computing is an effective technique to achieve system self-management in distributed heterogonous computing environment. Its aim is to automatically discover hardware fault and software fault and recover the fault with policy technology, which then realizes the system self management. Therefore, fault monitoring is an important research direction. This paper proposed an event classification method to design the fault monitoring of autonomic computing system. This method can monitor all the resources in heterogonous environment and report the fault to AC system, and then activate the policy to recover system fault, which provides the basis for system self recovery.
出处
《计算机科学》
CSCD
北大核心
2010年第8期175-177,共3页
Computer Science
关键词
自律计算
故障监视
分布式计算
自我修复
事件分类
Autonomic computing, Fautt monitoring, Distributed computing, Self-recovery, Event classification