期刊文献+
共找到7篇文章
< 1 >
每页显示 20 50 100
User-level failure detection and auto-recovery of parallel programs in HPC systems
1
作者 Guozhen ZHANG Yi LIU +2 位作者 Hailong YANG Jun XU Depei QIAN 《Frontiers of Computer Science》 SCIE EI CSCD 2021年第6期31-42,共12页
As the mean-time-between-failures(MTBF)continues to decline with the increasing number of components on large-scale high performance computing(HPC)systems,program failures might occur during the execution period with ... As the mean-time-between-failures(MTBF)continues to decline with the increasing number of components on large-scale high performance computing(HPC)systems,program failures might occur during the execution period with high probability.Ensuring successful execution of the HPC programs has become an issue that the unprivileged users should be concerned.From the user perspective,if the program failure cannot be detected and handled in time,it would waste resources and delay the progress of program execution.Unfortunately,the unprivileged users are unable to perform program state checking due to execution control by the job management system as well as the limited privilege.Currently,automated tools for supporting user-level failure detection and autorecovery of parallel programs in HPC systems are missing.This paper proposes an innovative method for the unprivileged user to achieve failure detection of job execution and automatic resubmission of failed jobs.The state checker in our method is encapsulated as an independent job to reduce interference with the user jobs.In addition,we propose a dual-checker mechanism to improve the robustness of our approach.We implement the proposed method as a tool named automatic re-launcher(ARL)and evaluate it on the Tianhe-2 system.Experiment results show that ARL can detect the execution failures effectively on Tianhe-2 system.In addition,the communication and performance overhead caused by ARL is negligible.The good scalability of ARL makes it applicable for large-scale HPC systems. 展开更多
关键词 high performance computing parallel program failure detection failure auto-recovery
原文传递
Motor speed estimation and failure detection of a small UAV using density of maxima
2
作者 Jefferson S.SOUZA Moises C.BEZERRIL +4 位作者 Mateus A.SILVA Frank C.VERAS Abel LIMA-FILHO Jorge Gabriel RAMOS Alisson V.BRITO 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2021年第7期1002-1009,共8页
This work presents the application of the technique named signal analysis based on chaos using density of maxima to analyze brushless direct current motors.It uses a correlation coefficient estimated from the density ... This work presents the application of the technique named signal analysis based on chaos using density of maxima to analyze brushless direct current motors.It uses a correlation coefficient estimated from the density of maxima of the current signal.This study demonstrates in experiments the speed estimation of a brushless motor on a testbench and failure detection in a small flying drone.The experimental results demonstrate that it is possible to estimate the speed in 97.8%of the cases and to detect failure in 82.75%of the analyzed cases. 展开更多
关键词 Unmanned aerial vehicle(UAV) Speed identification failure detection CHAOS
原文传递
Prediction of Link Failure in MANET-IoT Using Fuzzy Linear Regression
3
作者 R.Mahalakshmi V.Prasanna Srinivasan +1 位作者 S.Aghalya D.Muthukumaran 《Intelligent Automation & Soft Computing》 SCIE 2023年第5期1627-1637,共11页
A Mobile Ad-hoc NETwork(MANET)contains numerous mobile nodes,and it forms a structure-less network associated with wireless links.But,the node movement is the key feature of MANETs;hence,the quick action of the nodes ... A Mobile Ad-hoc NETwork(MANET)contains numerous mobile nodes,and it forms a structure-less network associated with wireless links.But,the node movement is the key feature of MANETs;hence,the quick action of the nodes guides a link failure.This link failure creates more data packet drops that can cause a long time delay.As a result,measuring accurate link failure time is the key factor in the MANET.This paper presents a Fuzzy Linear Regression Method to measure Link Failure(FLRLF)and provide an optimal route in the MANET-Internet of Things(IoT).This work aims to predict link failure and improve routing efficiency in MANET.The Fuzzy Linear Regression Method(FLRM)measures the long lifespan link based on the link failure.The mobile node group is built by the Received Signal Strength(RSS).The Hill Climbing(HC)method selects the Group Leader(GL)based on node mobility,node degree and node energy.Additionally,it uses a Data Gathering node forward the infor-mation from GL to the sink node through multiple GL.The GL is identified by linking lifespan and energy using the Particle Swarm Optimization(PSO)algo-rithm.The simulation results demonstrate that the FLRLF approach increases the GL lifespan and minimizes the link failure time in the MANET. 展开更多
关键词 Mobile ad-hoc network fuzzy linear regression method link failure detection particle swarm optimization hill climbing
下载PDF
Generalized reliability measures of Kalman filtering for precise point positioning
4
作者 Changhui Xu Xiaoping Rui +1 位作者 Xianfeng Song Jingxiang Gao 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2013年第4期699-705,共7页
To deal with the adverse influence of model failures on Kalman filtering (KF) estimation, it is necessary to investigate the generalized reliability theory, including the model failure detection and identification m... To deal with the adverse influence of model failures on Kalman filtering (KF) estimation, it is necessary to investigate the generalized reliability theory, including the model failure detection and identification method as well as the separability and reliability theories. Although the generalized reliability theory for the least square has been discussed for many decades, the generalized reliability theory of KF is not widely discussed. Compared with the least square, KF includes not only the measurement model, but also the dynamic model. In KF, the predicted value of the state parameters from the dynamic model is considered as pseudomeasurements and combined with the observed measurements to compose the form of the least square. According to the reliability of the least square, the generalized reliability of KF is derived. Then, the dynamic model failure of precise point positioning is simulated to demonstrate the usage of the generalized reliability theory. The results show that the adverse influence of the dynamic model failure is more severe than that of the measurement model. Moreover, it is recommended that the model failure identification should always be used even if the overall model test passes. It is shown that the derived generalized reliability measures are suitable for the generalized KF estimation. 展开更多
关键词 Kalman filtering (KF) RELIABILITY SEPARABILITY failure detection failure identification.
下载PDF
On the Effectiveness of the System Validation Based on the Black Box Testing Methodology
5
作者 Dusica Marijan Nikola Teslic +1 位作者 Miodrag Temerinac Vukota Pekovic 《Journal of Electronic Science and Technology of China》 2009年第4期385-389,共5页
With the advancement of technology in recent years, effective fault diagnosis became a necessity to verify the performance and ensure the quality of complex systems. In this paper, an original verification methodology... With the advancement of technology in recent years, effective fault diagnosis became a necessity to verify the performance and ensure the quality of complex systems. In this paper, an original verification methodology for complex consumer electronic devices is presented. Verification of the system which consists of hardware (integrated circuit) and corresponding software within a flat panel TV set is in the focus. Proposed methodology provides reliable functional failure detection using the concept of black box testing. Further, the approach is fully automated, improving the reliability and speed of failure detection. The methodology effectiveness has been experimentally evaluated and the analysis results have been reported. 展开更多
关键词 Automated verification black box testiug system failure detection system fault diagnosis.
下载PDF
Systematic exploration of signal-based indicators for failure diagnosis in the context of cyber-physical systems 被引量:1
6
作者 Santiago RUIZ-ARENAS Zoltán RUSáK +1 位作者 Imre HORVáTH Ricardo MEJí-GUTIERREZ 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2019年第2期152-175,共24页
Malfunction or breakdown of certain mission critical systems(MCSs) may cause losses of life, damage the environments, and/or lead to high costs. Therefore, recognition of emerging failures and preventive maintenance a... Malfunction or breakdown of certain mission critical systems(MCSs) may cause losses of life, damage the environments, and/or lead to high costs. Therefore, recognition of emerging failures and preventive maintenance are essential for reliable operation of MCSs. There is a practical approach for identifying and forecasting failures based on the indicators obtained from real life processes. We aim to develop means for performing active failure diagnosis and forecasting based on monitoring statistical changes of generic signal features in the specific operation modes of the system. In this paper, we present a new approach for identifying emerging failures based on their manifestations in system signals. Our approach benefits from the dynamic management of the system operation modes and from simultaneous processing and characterization of multiple heterogeneous signal sources. It improves the reliability of failure diagnosis and forecasting by investigating system performance in various operation modes, includes reasoning about failures and forming of failures using a failure indicator matrix which is composed of statistical deviation of signal characteristics between normal and failed operations, and implements a failure indicator concept that can be used as a plug and play failure diagnosis and failure forecasting feature of cyber-physical systems. We demonstrate that our method can automate failure diagnosis in the MCSs and lend the MCSs to the development of decision support systems for preventive maintenance. 展开更多
关键词 failure indicators failure classification failure detection and diagnosis Complex systems
原文传递
Detecting and Locating Failures in Communication Networks
7
作者 史维更 R.JamesDuckworth 《Journal of Computer Science & Technology》 SCIE EI CSCD 1990年第3期275-288,共14页
The connectivity of a strongly connected network may be destroyed after link damage.Since many net- works are connected by directed links,the reachability may be restored by altering the direction of one or more of th... The connectivity of a strongly connected network may be destroyed after link damage.Since many net- works are connected by directed links,the reachability may be restored by altering the direction of one or more of the links and thus reconfigoring the network.The location of the failed link must first be determined.In this paper,we examine new methods to determine the location of failed links and nodes in networks.A routing test approach is proposed and the conditions under which communication networks may be tested are discussed. Finally,an adaptive algorithm and a heuristic algorithm that can locate a single failed llnk or a single failed node are presented. 展开更多
关键词 Detecting and Locating failures in Communication Networks
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部