期刊文献+

基于事件处理的分布式系统故障定位技术 被引量:2

Fault Location Technology Based on the Distributed Event Processing System
下载PDF
导出
摘要 近年来,分布式计算系统的规模越来越大、行为越来越复杂难控,系统中出现的各种故障也呈指数级增长,造成了非常严重的危害和损失,并且出现问题时对故障的排查、定位难度进一步加大。传统的通过跟踪程序运行轨迹来判断程序运行正确与否的方法,在分布式监控信息的交互上因消耗过大而且对目标程序侵入性高,已经难以满足软件行为分析的需求。通过复杂事件的处理及时发现和定位系统故障在事件大量、快速、不间断发生的分布式监控环境中显得尤为迫切。它可以利用有意义的信息状态变化事件分析系统行为,进而判断系统的运行状况,及时发现系统故障并定位,保证系统的健康运行。当前已有的复杂事件描述语言大多数是基于SQL的方法来描述复杂事件。这种数据流查询语言对于普通用户而言比较复杂,难以掌握。通过构建一种基于集合的事件流模型,对事件进行形式化定义,使用集合来表示事件,并定义相应的操作,使得用户只需掌握几个简单的集合操作,便可以定义复杂的故障规则。 In recent years, distributed computing systems become larger and more complex to control. System faults are growing exponentially, resulting in a very serious harm and loss, and problems on trouble shooting and positioning difficulty further enlarges. Traditional ways by tracking program to judge the running and correct method, using excessive consumption of the target program and invasive in distributed monitoring information interaction, has been difficult to meet the demand of software behavior analysis. Through the complex event processing in time to find and locate the fault, this need in events in a large, rapid, uninterrupted occurrence of distributed monitoring environment appears especially urgent. It can use the meaningful information state change events to analyze system behaviors, and then judge the system operating conditions, to detect fault and positioning system, ensure the healthy operation. The complex event de- scription language is based on the SQL method to describe the complex events. This data stream query language is complex for ordinary users and difficult to master. By constructing a set based event flow model, we can use the set of events to conduct a formal definition. The user only needs to master a few simple assembly operations in order to define complex fault rule.
出处 《计算机科学》 CSCD 北大核心 2013年第06A期302-306,共5页 Computer Science
基金 国家"242"信息安全计划基金项目(2010A029) 中国科学院战略性科技先导专项(XDA06030200)资助
关键词 分布式网络 实时监控系统 故障定位 Distributed network, Real-time Monitoring system, Fault location
  • 相关文献

参考文献8

  • 1Kamoshida Y, Taura K. Scalable Data Gathering for Real-Time Monitoring Systems on Distributed Computing[C]//Procee- dings of IEEE International Symposium on Cluster Computing and the Grid. Tokyo, Japan, IEEE Computer Society, May 2008.
  • 2Robert D, Gardner David A. Network Fault Detectiom A Simpli- fied Approach to Alarm Correlation[C]//Proceedings of XVI World Telecom Congress, university of Strathclyde. 1997 : 115- 123.
  • 3Harrison K. Event Correlation in Telecommunication Network Management[R]. Hewlett-Packard Labs,Bristol,1994.
  • 4Lewis L. A Case-based Reasoning Approach to the Management of Faults in Communication Networks[C]//Proceeding IEEE Infocom' 93, vol. 3. San Francisco, 1993 : 114-120.
  • 5Lewis L. Implementing Policy in Enterprise Network[J]. IEEE Communications Magazine, 1996,34(1 ) : 50-55.
  • 6Jakobson G, Weissman M. Alarm Correlation [J]. IEEE Net- work, 1993,7 (6) : 52-59.
  • 7Gabriele S, Chiaravalloti E, D' Aquila Q, et al. Distributed real- time monitoring system to natural hazard evaluation and man- agement: the AMAMiR system [C] ff Proceedings of World IMACSI MODSIM Congress. 2009.
  • 8White W, Riedewald M, Gehrke J. What is "next" in event pro- cessing[C]//Proceedings of the twenty-sixth ACM SIGMOD- SIGACT-SIGART symposium on Principles of database sys- tems. New York, NY, USA, 2007 : 263 272.

同被引文献27

  • 1丁剑,白晓民,赵伟,方竹,李再华,仲悟之.基于复杂事件处理技术的电网故障信息分析及诊断方法[J].中国电机工程学报,2007,27(28):40-45. 被引量:19
  • 2Bruning S,Weissleder S,Malek M.A Fault Taxonomy forService-Oriented Architecture[C]∥10th IEEE High Assurance Systems Engineering Symposium (HASE’07).Plano,TX,2007:367-368.
  • 3Armbrust M,Fox A,Griffith R ,et al.A View of Cloud Computing[J].Communications of the ACM,2010,53(4):50-58.
  • 4Peng K-L,Huang C-Y.Reliability Evaluation of Service-oriented Architecture Systems Considering Fault-tolerance Designs[J].Journal of Applied Mathematics,2014,2014:1-11.
  • 5Carrera A,Iglesias C A,Garcia-Algarra J,et al.A Real-life Application of Multi-agent Systems for Fault Diagnosis in the Provision of An Internet Business Bervice[J].Journal of Network and Computer Applications,2014,37:146-154.
  • 6Zheng Z,Lyu M R.Personalized Reliability Prediction of WebServices[J].ACM Transactions on Software Engineering and Methodology,2013,22(2):1-25.
  • 7Yang Tao,Wei Xin,Yu Liang-wen,et al.MisDis:An Efficent Misbehavior Discovering Method Based on Accountability and State Machine in VANET[C]∥The 15th Asia-Pacific Web Conference (APWeb 2013).Sydney,Australia,2013:583-594.
  • 8Xue T,Ying S,Wu Q.Exception Handling in Service-orientedSoftware:A Survey[C]∥2013 IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber,Physical and Social Computing.Beijing,China,2013:2048-2052.
  • 9Wagner S,Kopp O,Leymann F.Consolidation of InteractingBPEL Process Models with Fault Handlers[C]∥Proceedings of the 5th Central-European Workshop on Services and their Composition (ZEUS 2013).Rostock,Germany,2013:9-16.
  • 10Ardissono L,Console L,Goy A,et al.Enhancing Web Services with Diagnostic Capabilities[C]∥Proceedings of the Third European Conference on Web Services (ECOWS’05).Vxj,Sweden,2005:182-191.

引证文献2

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部