期刊文献+

面向瞬态故障的软件容错技术 被引量:7

Software Fault-Tolerance Techniques for Transient Faults
下载PDF
导出
摘要 宇宙射线辐射所导致的瞬态故障一直是航天计算面临的最主要挑战之一。而随着集成电路制造工艺的持续进步,现代处理器的性能在大幅度提高的同时,其可信性也正日益面临着瞬态故障的严重威胁。当前针对瞬态故障的容错技术可大致分为两类:基于硬件实现和基于软件实现。相比较前者,后者由于在实现成本和灵活性等方面的优势而备受关注。本文首先概述了面向瞬态故障的容错基本原理和对应软件容错技术的主要特点;然后,从不同实现层次介绍和分析了软件容错技术有代表性的最新研究成果;最后,对当前研究的特点和存在的问题进行了总结,并对软件容错技术未来的研究方向给出了意见。 Transient faults, which are caused by the radiation of cosmic rays, are always one of the top challenges for computing in space applications. With the continuous progress of integrated circuits, the performance of modern processors are improved significantly, but their dependability are increasingly affected by transient faults. Currently, the techniques for transient fault tolerance can mainly be classified into two types: hardware-implemented and software-implemented. Comparing with the former techniques, the latter are attractive because of their advantages on costs and flexibility. This paper firstly sketches the basic principle of transient fault tolerance and the characteristics of software-implemented techniques. Then, the representative techniques of software-implemented fault tolerance are introduced and analyzed from different levels. Finally, the properties and defects of the current studies are summa rized, and the advices are proposed for the future research trends of software-implemented fault tolerance.
出处 《计算机工程与科学》 CSCD 北大核心 2011年第11期132-139,共8页 Computer Engineering & Science
关键词 瞬态故障 软错误 软件容错 冗余计算 可信计算 transient fault soft error software fault tolerance redundancy computing dependable computing
  • 相关文献

参考文献27

  • 1Ziegler J F. IBM Experiments in Soft Fails in Computer Elec tronics(1978-1994)[J]. IBM Journal of Research Develop ment, 1996, 40(1):3- 18.
  • 2Clark J A, Pradhan D K. Fault Injection: A Method for Val-idating Computer System Dependability[J]. IEEE Comput- er, 1995, 28(6):47-56.
  • 3Baumann R C. Radiation-Induced Soft Errors in Advanced Semiconductor Technologies[J]. IEEE Trans on Device and Materials Reliability, 2004, 5(3) :305- 316.
  • 4Shivakumar P, Kistler M, Keckler S W, et al. Modeling the Effect of Technology Trends on the Soft Error Rate of Com-binational Logic[C] ff Proe of lhe Int' 1 Conf on Dependable Systems and Networks, 2002:389-399.
  • 5傅忠传,陈红松,崔刚,杨孝宗.处理器容错技术研究与展望[J].计算机研究与发展,2007,44(1):154-160. 被引量:36
  • 6Weaver C, Emer J, Mukherjee S S, et al. Techniques to Re duce the Soft Error Rate of a High-Performance Microprocessor[C]//Proc of the 31st Ann lnt'l Symp on Computer Ar- chitecture, 2004:264- 275.
  • 7Reinhardt S K, Mukherjee S S. Transient Fault Detection via Simuhaneous Multithreading[C]//Proe of the 27th Ann Inl'1 Symp on Computer Architecture, 2000:25-36.
  • 8Yeh Y C. Triple Triple Redundant 777 Primary Flight Cornputer[C]//Proc of 1996 IEEE Aerospace Applications Conf,1996:293- 307.
  • 9Mukherjee S S, Kontz M, Reinhardt S K. Detailed Design and Evaluation of Redundant Multithreading Alternatives[C]//Proc of the 29th Ann Int'l Symp on Computer Architec ture, 2002:99- 110.
  • 10杨学军,高珑.空间探测中基于COTS部件的软件容错技术[J].计算机工程与科学,2007,29(8):82-87. 被引量:1

二级参考文献65

  • 1江建慧,员春欣.芯片级系统的在线测试技术[J].计算机研究与发展,2004,41(9):1593-1603. 被引量:2
  • 2Fu Zhongchuan,Chen Hongsong,Cui Gang.MICROTHREAD BASED (MTB) COARSE GRAINED FAULT TOLERANCE SUPERSCALAR PROCESSOR ARCHITECTURE[J].Journal of Electronics(China),2006,23(3):461-466. 被引量:3
  • 3高珑,杨学军.高性能低功耗的容错编译技术:错误流压缩算法[J].软件学报,2006,17(12):2425-2437. 被引量:4
  • 42006.http://www.stratus.com/
  • 5Premkishore Shivakumar,Michael Kistler,Stephen W Keckler,et al.Modeling the effect of technology trends on the soft error rate of combinational logic[C].2002 Int'l Conf on Dependable Systems and Networks,Bethesda,USA,2002
  • 6P P Shirvani,E J McCluskey.PADded cache:A new fault tolerance technique for cache memories[C].IEEE 17th VLSI Test Symposium,San Diego,1999
  • 7M Rebaudengo,M Sonza Reorda,M Violante.An accurate analysis of the effects of soft errors in the instruction and data caches of a pipelined microprocessor[C].Design Automation and Test in Europe Conference and Exhibition,Munich,Germany,2003
  • 8Shubhendu S Mukherjee,Joel Emer,Tryggve Fossum,et al.Cache scrubbing in microprocessors:Myth or necessity[C].The 10th Int'l Symp on Pacific Rim Dependable Computing (PRDC),Papeete,2004
  • 9B Nicolescu,P Peronnard,R Velazco,et al.Efficiency of transient bit-flips detection by software means:A complete study[C].The 18th IEEE Int'l Symp on Defect and Fault Tolerance in VLSI Systems (DFT'03),Cambridge,2003
  • 10A Avizienis.The N-version approach to fault-tolerant software[J].IEEE Trans on Software Engineering,1985,11(12):1491-1501

共引文献35

同被引文献41

引证文献7

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部