摘要
近年来,分布式计算系统的规模越来越大、行为越来越复杂难控,系统中出现的各种故障也呈指数级增长,造成了非常严重的危害和损失,并且出现问题时对故障的排查、定位难度进一步加大。传统的通过跟踪程序运行轨迹来判断程序运行正确与否的方法,在分布式监控信息的交互上因消耗过大而且对目标程序侵入性高,已经难以满足软件行为分析的需求。通过复杂事件的处理及时发现和定位系统故障在事件大量、快速、不间断发生的分布式监控环境中显得尤为迫切。它可以利用有意义的信息状态变化事件分析系统行为,进而判断系统的运行状况,及时发现系统故障并定位,保证系统的健康运行。当前已有的复杂事件描述语言大多数是基于SQL的方法来描述复杂事件。这种数据流查询语言对于普通用户而言比较复杂,难以掌握。通过构建一种基于集合的事件流模型,对事件进行形式化定义,使用集合来表示事件,并定义相应的操作,使得用户只需掌握几个简单的集合操作,便可以定义复杂的故障规则。
In recent years, distributed computing systems become larger and more complex to control. System faults are growing exponentially, resulting in a very serious harm and loss, and problems on trouble shooting and positioning difficulty further enlarges. Traditional ways by tracking program to judge the running and correct method, using excessive consumption of the target program and invasive in distributed monitoring information interaction, has been difficult to meet the demand of software behavior analysis. Through the complex event processing in time to find and locate the fault, this need in events in a large, rapid, uninterrupted occurrence of distributed monitoring environment appears especially urgent. It can use the meaningful information state change events to analyze system behaviors, and then judge the system operating conditions, to detect fault and positioning system, ensure the healthy operation. The complex event de- scription language is based on the SQL method to describe the complex events. This data stream query language is complex for ordinary users and difficult to master. By constructing a set based event flow model, we can use the set of events to conduct a formal definition. The user only needs to master a few simple assembly operations in order to define complex fault rule.
出处
《计算机科学》
CSCD
北大核心
2013年第06A期302-306,共5页
Computer Science
基金
国家"242"信息安全计划基金项目(2010A029)
中国科学院战略性科技先导专项(XDA06030200)资助
关键词
分布式网络
实时监控系统
故障定位
Distributed network, Real-time Monitoring system, Fault location