期刊文献+

一种基于不变量的软错误检测方法 被引量:5

Approach for Detecting Soft Error by Using Program Invariant
下载PDF
导出
摘要 软错误是高辐照空间环境下影响计算可靠性的主要因素,结果错误(silent data corruption,简称SDC)是软错误造成的一种特殊的故障类型.针对SDC难以检测的问题,提出了一种基于不变量的检测方法.不变量是运行时刻保持不变的程序特征.在软错误发生后,由于程序受到影响,不变量一般不再满足.根据该原理,在源代码中插入以不变量为内容的断言,利用发生软错误后断言报错来检测软错误.首先,根据错误传播分析确定了检测位置,提取了检测位置的不变量;定义了表征不变量检测能力的渗透率,在同一检测位置依据渗透率将不变量转化为断言.通过错误注入实验,验证了该检测方法的有效性.实验结果表明:该检测方法具备较高的检出率和较低的检测代价,为星载系统的软错误防护提供了新的解决思路. Soft error has a great influence on computing reliability of space devices and could result in silent data corruption (SDC), which means wrong outcomes of a program without any crash detected. As SDC-causing fault always propagates silently, it is very difficult to detect SDC. In this paper, an approach for detecting SDC is proposed by using program invariant. A program invariant is a set of properties of program. Normally, the invariant holds during runtime. But when soft error occurs, the invariant is often violated due to the impact of soft error. Based on this principle, invariant-based asserts are inserted into source code. Once an exception is thrown by an assert, it indicates that soft error is detected. By analyzing the propagation of the fault that leads to SDC, the locations where asserts are embedded are selected and then invariants are extracted. Some of the invariants are converted to asserts based on their permeability, which indicates the capabilities of detecting soft error. The proposed approach is evaluated by fault injection experiment which shows that it achieves high coverage with low overhead. The approach broadens the ways of protecting satellite system from soft error.
作者 马骏驰 汪芸
出处 《软件学报》 EI CSCD 北大核心 2016年第2期219-230,共12页 Journal of Software
关键词 单粒子翻转 结果错误 错误检测 不变量 single event upset silent data corruption error detection program invariant
  • 相关文献

参考文献18

  • 1Waiters JP, Zick KM, French M. A practical characterization of a NASA SpaceCube application through fault emulation and laser testing. In: Proc. of the Dependable Systems and Networks. Washington: IEEE Computer Society, 2013. 1-8. [doi: 10.1109/DSN. 2013.6575354].
  • 2Xing KF. Single event effect detection and mitigation techniques for spacebome signal processing platform [Ph.D. Thesis]. Changsha: National University of Defense Technology, 2011 (in Chinese with English abstract).
  • 3Xu JJ. Research on compile techniques of fault tolerance for soft errors [Ph.D. Thesis]. Changsha: National University of Defense Technology, 2010 (in Chinese with English abstract).
  • 4Racunas P, Constantinides K, Mannc S. Perturbation-Based fault screening. In: Proc. of the Int'l Conf. on High-Performance Computer Architecture. Washington: IEEE Computer Society, 2007. 169-180. [doi: I0.I 109/HPCA.2007.346195].
  • 5Gu W, Kalbarczyk Z, Iyer RK. Characterization of Linux kernel behavior under errors. In: Proc. of the Dependable Systems and Networks. Washington: IEEE Computer Society, 2003.22-25. [doi: 10.1109/DSN.2003.1209956].
  • 6Shafique M, Rehman S, Aceituno PV. Exploiting program-level masking and error propagation for constrained reliability optimization. In: Proc. of the 50th Annual Design Automation Conf. New York: ACM Press, 2013. 17. Idol: 10.1145/2463209.2488 755].
  • 7Rehman S, Shafique M, Aceituno PV. Leveraging variable function resilience for selective software reliability on unreliable hardware. In: Proc. of the Conf. on Design, Automation and Test in Europe. Washington: IEEE Computer Society, 2013. 1759- 1764. [doi: 10.7873/DATE.2013.354].
  • 8Wang NJ, Patel SJ. ReStore: Symptom-Based soft error detection in microprocessors. IEEE Trans. on Dependable and Secure Computing, 2006,3(3):188-201. [doi: 10.1109/TDSC.2006.40].
  • 9Li ML, Ramachandran P, Sahoo SK. Understanding the propagation of hard errors to software and implications for resilient system design. ACM SIGARCH Computer Architecture News, 2008,36(1):265-276. [doi: 10.1145/1346281.1346315].
  • 10Sahoo SK, Li ML, Ramachandran P. Using likely program invariants to detect hardware errors. In: Proc. of the Dependable Systems and Networks with FTCS and DCC. Washington: IEEE Computer Society, 2008. 70-79. [doi: 10.1109/DSN.2008.46300 72].

同被引文献18

引证文献5

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部