期刊文献+

基于动态追踪的自愈操作系统故障监测技术 被引量:1

Fault detection technique of self-healing operating system based on dynamic tracing
下载PDF
导出
摘要 操作系统内核故障往往集中分布在特定位置,其中动态内存分配和资源竞争相关代码段为典型的故障集中点,针对上述两类故障集中点,提出了一种新的基于内核动态追踪的故障监测技术,通过追踪导致内核全局数据状态迁移的方法调用,依据设计的规则对记录的调用序列和数据进行分析,实现对故障的监测和定位。监测技术在Linux操作系统中以可加载内核模块的形式实现,不需要额外硬件支持和对原系统代码进行修改。通过故障注入实验验证了监测技术的有效性,监测延时低于已有的基于时间和系统性能指标的故障监测技术。 Code segments about dynamic memory allocation and resource preemption operations are the major source of operating system faults.This paper provides a new fault detection technique based on dynamic kernel tracing,by collecting information about the call stack of kernel and global state transition,the fault source and type can be specified.The fault detection technique is implemented as loadable kernel module in Linux,which can collect kernel information effectively without additional hardware and modification of original system code.Results of fault injection experiments can prove that the proposed technique can detect faults effectively and the detection delay is smaller than former methods based on time-out and performance metrics.
出处 《计算机工程与应用》 CSCD 北大核心 2015年第16期42-46,共5页 Computer Engineering and Applications
基金 航天支撑技术基金(No.2013-HT-XGD(10)) 陕西省科学技术研究发展计划项目(No.2014K05-25) 西北工业大学研究生创业种子基金(No.Z2014065)
关键词 自愈操作系统 故障监测 内核动态追踪 self-healing operating system fault detection dynamic kernel tracing
  • 相关文献

参考文献16

  • 1Barbosa R.Layered fault tolerance for distributed embedded systems[M].[S.l.]:Chalmers University of Technology,2008.
  • 2Schneider C,Barker A,Dobson S.A survey of self-healing systems frameworks[J].Software:Practice and Experience,2014.
  • 3Hamann P S,Perry R L.Compensation recommendations,US patent 20,140,032,382[P].2014.
  • 4Asghari S A,Kaynak O,Taheri H.An investigation into soft error detection efficiency at operating system level[J].The Scientific World Journal,2014.
  • 5Abaffy J,Krajcovic T.Software support for multiple hardware watchdog timers in the linux OS[J].Applied Electronics,2010:17-19.
  • 6David F M,Campbell R H.Building a self-healing operating system[C]//Third IEEE International Symposium on Dependable,Autonomic and Secure Computing,2007:3-10.
  • 7Zhu Y,Li Y,Xue J,et al.What is system hang and how to handle it[C]//Software Reliability Engineering(ISSRE).
  • 8Abaffy J,Kraj?ovi?T.Multiple software watchdog timers in the linux OS[M]//Emerging Trends in Computing,Informatics,Systems Sciences,and Engineering.[S.l.]:Springer,2013:759-765.
  • 9Palix N,Thomas G,Saha S,et al.Faults in Linux:ten years later[C]//ACM SIGARCH Computer Architecture News,ACM,2011:305-318.
  • 10Chou A,Yang J,Chelf B,et al.An empirical study of operating systems errors[Z].ACM,2001.

同被引文献4

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部