针对低电压下cache硬错误和软错误概率提高导致cache不能正常工作的问题,提出了一种基于混合纠错码的cache结构。该结构利用脏数据正确性必须由处理器中cache保证而干净数据可由片外恢复的数据特征,将cache分成多比特纠错码和单比特纠...针对低电压下cache硬错误和软错误概率提高导致cache不能正常工作的问题,提出了一种基于混合纠错码的cache结构。该结构利用脏数据正确性必须由处理器中cache保证而干净数据可由片外恢复的数据特征,将cache分成多比特纠错码和单比特纠错码保护的两个区域。通过采用新的cache替换策略,使得脏数据总处于多比特纠错码保护区域,保证其得到较强保护,从而保证cache在低电压下的可靠性运行。基于EEMBC测试基准的实验结果表明,该设计可以在590 m V电压下正常运行,与该领域最新研究VS-ECC相比,降低了23.6%的纠错码存储信息量,性能提高5.9%。展开更多
petaPar粒子模拟程序面向千万亿次级计算,在统一框架下实现两种广受关注的粒子模拟算法:光滑粒子流体动力学(Smoothed Particle Hydrodynamics,SPH)和物质点法(Material Point Method,MPM)。代码支持多种材料模型、强度模型和失效模型,...petaPar粒子模拟程序面向千万亿次级计算,在统一框架下实现两种广受关注的粒子模拟算法:光滑粒子流体动力学(Smoothed Particle Hydrodynamics,SPH)和物质点法(Material Point Method,MPM)。代码支持多种材料模型、强度模型和失效模型,适合模拟大变形、高应变率和流固耦合问题。支持纯MPI和MPI+X混合两种并行模型。系统具有可容错性,支持无人值守变进程重启。在Titan上测试表明,petaPar可线性扩展到26万CPU核,SPH和MPM算法并行效率相对8 192核分别为87%和90%。展开更多
Fault tolerant ability is an important aspect for overall evaluation of distributed system(DS). This paper discusses three measures for the evaluation: node/edge connectivity, number of spanning trees and synthetic co...Fault tolerant ability is an important aspect for overall evaluation of distributed system(DS). This paper discusses three measures for the evaluation: node/edge connectivity, number of spanning trees and synthetic connectivity. A numerical example for illustration and analysis is given, and the synthetic connectivity measure presented by this paper is proved to be rational and satisfactory.展开更多
High availability is a critical mission for business system. At first, an instance of business system OPENSTOCK for pharmacy is introduced including both client and server sides. Secondly, a solution to the high avail...High availability is a critical mission for business system. At first, an instance of business system OPENSTOCK for pharmacy is introduced including both client and server sides. Secondly, a solution to the high availability of this system is given in detail, including design and implementation. The essentiality of this solution consists of scope of system information, system parameter tables of service status, schedule strategies of load ba lance and how to acquire system parameters and detect service states. The solution proposed is scalable and application oriented and supporting load balance for high performance and fault tolerate for high reliability. This application system has been applied and verified realistically, and the features of this business system derived in this paper have been achieved.展开更多
To Integrate the capacity of sensing, communication, computing, and actuating, one of the compelling technological advances of these years has been the appearance of distributed wireless sensor network (DSN) for infor...To Integrate the capacity of sensing, communication, computing, and actuating, one of the compelling technological advances of these years has been the appearance of distributed wireless sensor network (DSN) for information gathering tasks. In order to save the energy, multi-hop routing between the sensor nodes and the sink node is necessary because of limited resource. In addition, the unpredictable conditional factors make the sensor nodes unreliable. In this paper, the reliability of routing designed for sensor network and some dependability issues of DSN, such as MTTF (mean time to failure) and the probability of connectivity between the sensor nodes and the sink node are analyzed. Unfortunately, we could not obtain the accurate result for the arbitrary network topology, which is #P-hard problem. And the reliability analysis of restricted topologies clustering-based is given. The method proposed in this paper will show us a constructive idea about how to place energy-constrained sensor nodes in the network efficiently from the prospective of reliability.展开更多
This paper presents software reliability modeling issues at the early stage of a software development for fault tolerant software management system. Based on Stochastic Reward Nets, an effective model of hierarchical ...This paper presents software reliability modeling issues at the early stage of a software development for fault tolerant software management system. Based on Stochastic Reward Nets, an effective model of hierarchical view for a fault tolerant software management system is put forward, and an approach that consists of system transient performance analysis is adopted. A quantitative approach for software reliability analysis is given. The results show its usefulness for the design and evaluation of the early-stage software reliability modeling when failure data is not available.展开更多
In this paper, we conduct research on the network intrusion detection system based on the modified particle swarm optimization algorithm. Computer interconnection ability put forward the higher requirements for the sy...In this paper, we conduct research on the network intrusion detection system based on the modified particle swarm optimization algorithm. Computer interconnection ability put forward the higher requirements for the system reliability design, the need to ensure that the system can support various communication protocols to guarantee the reliability and security of the network. At the same time also require network system, the server or products have strong ability of fault tolerance and redundancy, better meet the needs of users, to ensure the safety of the information data and the good operation of the network system. For this target, we propose the novel paradigm for the enhancement of the modern computer network that is innovative.展开更多
SRAM (Static RAM)-based FPGAs (Field Programmable Gate Arrays (FPGAs) have gained wide acceptance due to their on-line reconfigurable features. The growing demand for FPGAs has motivated semiconductor chip manufa...SRAM (Static RAM)-based FPGAs (Field Programmable Gate Arrays (FPGAs) have gained wide acceptance due to their on-line reconfigurable features. The growing demand for FPGAs has motivated semiconductor chip manufacturers to build more densely packed FPGAs with higher logic capacity. The downside of high density devices is that the probability of errors in such devices tends to increase. This paper proposes an FPGA architecture that is composed of an array of cells with built in error correction capability. Collectively a group of such cells can implement any logic function that is either registered or combinational. A cell is composed of three units: a logic block, a fault-tolerant address generator and a director unit. The logic block uses a look-up table to implement logic functions. The fault-tolerant address generator corrects any single bit error in the incoming data to the functional cell. The director block can transmit output data from the logic block to another cell located at its South, North, East or West, or to cells in all four directions. Thus a functional cell can also be used to route signals to other functional cells, thus avoiding any intricate network of interconnects, switching boxes, or routers commonly found in commercially available FPGAs.展开更多
Dependability analysis is an important step in designing and analyzing safety computer systems and protection systems.Introducing multi-processor and virtual machine increases the system faults' complexity,diversi...Dependability analysis is an important step in designing and analyzing safety computer systems and protection systems.Introducing multi-processor and virtual machine increases the system faults' complexity,diversity and dynamic,in particular for software-induced failures,with an impact on the overall dependability.Moreover,it is very different for safety system to operate successfully at any active phase,since there is a huge difference in failure rate between hardware-induced and softwareinduced failures.To handle these difficulties and achieve accurate dependability evaluation,consistently reflecting the construct it measures,a new formalism derived from dynamic fault graphs(DFG) is developed in this paper.DFG exploits the concept of system event as fault state sequences to represent dynamic behaviors,which allows us to execute probabilistic measures at each timestamp when change occurs.The approach automatically combines the reliability analysis with the system dynamics.In this paper,we describe how to use the proposed methodology drives to the overall system dependability analysis through the phases of modeling,structural discovery and probability analysis,which is also discussed using an example of a virtual computing system.展开更多
As feature size scales down, reliability issues like single event upset(SEU) have become serious for circuit and system designers, especially for those who work on memory and latch designs. In this paper, an improved ...As feature size scales down, reliability issues like single event upset(SEU) have become serious for circuit and system designers, especially for those who work on memory and latch designs. In this paper, an improved SEU tolerant data cell design based on the Quatro-10 T cell is proposed. The introduced cell enhances the capability of SEU tolerance by weakening the key transistors in the feedback loop to block the effects of transient fault. Simulation results show that our proposed design achieves obvious higher resilience to SEU and better performance on speed and power dissipation at the expense of an increased area. The proposed cell is a fully SEU immune design with an amount of critical charge at least 7 times more than the Quatro-10 T cell and has the lowest Power Delay Product. It shows that our design is very suitable in high-performance circuit and system design.展开更多
With the development of high-speed railways in China,more than 2000 high-speed trains will be put into use.Safety and efficiency of railway transportation is increasingly important.We have designed a high availability...With the development of high-speed railways in China,more than 2000 high-speed trains will be put into use.Safety and efficiency of railway transportation is increasingly important.We have designed a high availability quadruple vital computer (HAQVC) system based on the analysis of the architecture of the traditional double 2-out-of-2 system and 2-out-of-3 system.The HAQVC system is a system with high availability and safety,with prominent characteristics such as fire-new internal architecture,high efficiency,reliable data interaction mechanism,and operation state change mechanism.The hardware of the vital CPU is based on ARM7 with the real-time embedded safe operation system (ES-OS).The Markov modeling method is designed to evaluate the reliability,availability,maintainability,and safety (RAMS) of the system.In this paper,we demonstrate that the HAQVC system is more reliable than the all voting triple modular redundancy (AVTMR) system and double 2-out-of-2 system.Thus,the design can be used for a specific application system,such as an airplane or high-speed railway system.展开更多
文摘针对低电压下cache硬错误和软错误概率提高导致cache不能正常工作的问题,提出了一种基于混合纠错码的cache结构。该结构利用脏数据正确性必须由处理器中cache保证而干净数据可由片外恢复的数据特征,将cache分成多比特纠错码和单比特纠错码保护的两个区域。通过采用新的cache替换策略,使得脏数据总处于多比特纠错码保护区域,保证其得到较强保护,从而保证cache在低电压下的可靠性运行。基于EEMBC测试基准的实验结果表明,该设计可以在590 m V电压下正常运行,与该领域最新研究VS-ECC相比,降低了23.6%的纠错码存储信息量,性能提高5.9%。
文摘Fault tolerant ability is an important aspect for overall evaluation of distributed system(DS). This paper discusses three measures for the evaluation: node/edge connectivity, number of spanning trees and synthetic connectivity. A numerical example for illustration and analysis is given, and the synthetic connectivity measure presented by this paper is proved to be rational and satisfactory.
文摘High availability is a critical mission for business system. At first, an instance of business system OPENSTOCK for pharmacy is introduced including both client and server sides. Secondly, a solution to the high availability of this system is given in detail, including design and implementation. The essentiality of this solution consists of scope of system information, system parameter tables of service status, schedule strategies of load ba lance and how to acquire system parameters and detect service states. The solution proposed is scalable and application oriented and supporting load balance for high performance and fault tolerate for high reliability. This application system has been applied and verified realistically, and the features of this business system derived in this paper have been achieved.
基金This work was supported by National Defence Advanced Research Fund .Serial No.5141604010HT0117
文摘To Integrate the capacity of sensing, communication, computing, and actuating, one of the compelling technological advances of these years has been the appearance of distributed wireless sensor network (DSN) for information gathering tasks. In order to save the energy, multi-hop routing between the sensor nodes and the sink node is necessary because of limited resource. In addition, the unpredictable conditional factors make the sensor nodes unreliable. In this paper, the reliability of routing designed for sensor network and some dependability issues of DSN, such as MTTF (mean time to failure) and the probability of connectivity between the sensor nodes and the sink node are analyzed. Unfortunately, we could not obtain the accurate result for the arbitrary network topology, which is #P-hard problem. And the reliability analysis of restricted topologies clustering-based is given. The method proposed in this paper will show us a constructive idea about how to place energy-constrained sensor nodes in the network efficiently from the prospective of reliability.
基金This work was supported in part by the Ph.D.Programs Foundation of Ministry of Education of China under
文摘This paper presents software reliability modeling issues at the early stage of a software development for fault tolerant software management system. Based on Stochastic Reward Nets, an effective model of hierarchical view for a fault tolerant software management system is put forward, and an approach that consists of system transient performance analysis is adopted. A quantitative approach for software reliability analysis is given. The results show its usefulness for the design and evaluation of the early-stage software reliability modeling when failure data is not available.
文摘In this paper, we conduct research on the network intrusion detection system based on the modified particle swarm optimization algorithm. Computer interconnection ability put forward the higher requirements for the system reliability design, the need to ensure that the system can support various communication protocols to guarantee the reliability and security of the network. At the same time also require network system, the server or products have strong ability of fault tolerance and redundancy, better meet the needs of users, to ensure the safety of the information data and the good operation of the network system. For this target, we propose the novel paradigm for the enhancement of the modern computer network that is innovative.
基金Acknowledgement The first author was supported in part by the National Science Foundation, USA under Grant 0925080.
文摘SRAM (Static RAM)-based FPGAs (Field Programmable Gate Arrays (FPGAs) have gained wide acceptance due to their on-line reconfigurable features. The growing demand for FPGAs has motivated semiconductor chip manufacturers to build more densely packed FPGAs with higher logic capacity. The downside of high density devices is that the probability of errors in such devices tends to increase. This paper proposes an FPGA architecture that is composed of an array of cells with built in error correction capability. Collectively a group of such cells can implement any logic function that is either registered or combinational. A cell is composed of three units: a logic block, a fault-tolerant address generator and a director unit. The logic block uses a look-up table to implement logic functions. The fault-tolerant address generator corrects any single bit error in the incoming data to the functional cell. The director block can transmit output data from the logic block to another cell located at its South, North, East or West, or to cells in all four directions. Thus a functional cell can also be used to route signals to other functional cells, thus avoiding any intricate network of interconnects, switching boxes, or routers commonly found in commercially available FPGAs.
基金This work was supported in part by National Natural Science Foundation of China under grant No.61272411 and National 973 Basic Research Program of China under grant No.2014CB340600
文摘Dependability analysis is an important step in designing and analyzing safety computer systems and protection systems.Introducing multi-processor and virtual machine increases the system faults' complexity,diversity and dynamic,in particular for software-induced failures,with an impact on the overall dependability.Moreover,it is very different for safety system to operate successfully at any active phase,since there is a huge difference in failure rate between hardware-induced and softwareinduced failures.To handle these difficulties and achieve accurate dependability evaluation,consistently reflecting the construct it measures,a new formalism derived from dynamic fault graphs(DFG) is developed in this paper.DFG exploits the concept of system event as fault state sequences to represent dynamic behaviors,which allows us to execute probabilistic measures at each timestamp when change occurs.The approach automatically combines the reliability analysis with the system dynamics.In this paper,we describe how to use the proposed methodology drives to the overall system dependability analysis through the phases of modeling,structural discovery and probability analysis,which is also discussed using an example of a virtual computing system.
基金supported by the Fundamental Research Funds for the Central Universitiesthe National Natural Science Foundation of China for the Youth(Grant No.61306111)
文摘As feature size scales down, reliability issues like single event upset(SEU) have become serious for circuit and system designers, especially for those who work on memory and latch designs. In this paper, an improved SEU tolerant data cell design based on the Quatro-10 T cell is proposed. The introduced cell enhances the capability of SEU tolerance by weakening the key transistors in the feedback loop to block the effects of transient fault. Simulation results show that our proposed design achieves obvious higher resilience to SEU and better performance on speed and power dissipation at the expense of an increased area. The proposed cell is a fully SEU immune design with an amount of critical charge at least 7 times more than the Quatro-10 T cell and has the lowest Power Delay Product. It shows that our design is very suitable in high-performance circuit and system design.
基金Project(No.2009BAG12A05) supported by the National Key Technology R&D Program of China
文摘With the development of high-speed railways in China,more than 2000 high-speed trains will be put into use.Safety and efficiency of railway transportation is increasingly important.We have designed a high availability quadruple vital computer (HAQVC) system based on the analysis of the architecture of the traditional double 2-out-of-2 system and 2-out-of-3 system.The HAQVC system is a system with high availability and safety,with prominent characteristics such as fire-new internal architecture,high efficiency,reliable data interaction mechanism,and operation state change mechanism.The hardware of the vital CPU is based on ARM7 with the real-time embedded safe operation system (ES-OS).The Markov modeling method is designed to evaluate the reliability,availability,maintainability,and safety (RAMS) of the system.In this paper,we demonstrate that the HAQVC system is more reliable than the all voting triple modular redundancy (AVTMR) system and double 2-out-of-2 system.Thus,the design can be used for a specific application system,such as an airplane or high-speed railway system.