This paper is the first attempt to investigate the risk probability criterion in semi-Markov decision processes with loss rates. The goal is to find an optimal policy with the minimum risk probability that the total l...This paper is the first attempt to investigate the risk probability criterion in semi-Markov decision processes with loss rates. The goal is to find an optimal policy with the minimum risk probability that the total loss incurred during a first passage time to some target set exceeds a loss level. First, we establish the optimality equation via a successive approximation technique, and show that the value function is the unique solution to the optimality equation. Second, we give suitable conditions, under which we prove the existence of optimal policies and develop an algorithm for computing ?-optimal policies. Finally, we apply our main results to a business system.展开更多
This paper considers a first passage model for discounted semi-Markov decision processes with denumerable states and nonnegative costs. The criterion to be optimized is the expected discounted cost incurred during a f...This paper considers a first passage model for discounted semi-Markov decision processes with denumerable states and nonnegative costs. The criterion to be optimized is the expected discounted cost incurred during a first passage time to a given target set. We first construct a semi-Markov decision process under a given semi-Markov decision kernel and a policy. Then, we prove that the value function satisfies the optimality equation and there exists an optimal (or ε-optimal) stationary policy under suitable conditions by using a minimum nonnegative solution approach. Further we give some properties of optimal policies. In addition, a value iteration algorithm for computing the value function and optimal policies is developed and an example is given. Finally, it is showed that our model is an extension of the first passage models for both discrete-time and continuous-time Markov decision processes.展开更多
For critical engineering systems such as aircraft and aerospace vehicles, accurate Remaining Useful Life(RUL) prediction not only means cost saving, but more importantly, is of great significance in ensuring system re...For critical engineering systems such as aircraft and aerospace vehicles, accurate Remaining Useful Life(RUL) prediction not only means cost saving, but more importantly, is of great significance in ensuring system reliability and preventing disaster. RUL is affected not only by a system's intrinsic deterioration, but also by the operational conditions under which the system is operating. This paper proposes an RUL prediction approach to estimate the mean RUL of a continuously degrading system under dynamic operational conditions and subjected to condition monitoring at short equi-distant intervals. The dynamic nature of the operational conditions is described by a discrete-time Markov chain, and their influences on the degradation signal are quantified by degradation rates and signal jumps in the degradation model. The uniqueness of our proposed approach is formulating the RUL prediction problem in a semi-Markov decision process framework, by which the system mean RUL can be obtained through the solution to a limited number of equations. To extend the use of our proposed approach in real applications, different failure standards according to different operational conditions are also considered. The application and effectiveness of this approach are illustrated by a turbofan engine dataset and a comparison with existing results for the same dataset.展开更多
This paper investigates the Borel state space semi-Markov decision process (SMDP) with the criterion of expected total rewards in a semi-Markov environment. It describes a system which behaves like a SMDP except that ...This paper investigates the Borel state space semi-Markov decision process (SMDP) with the criterion of expected total rewards in a semi-Markov environment. It describes a system which behaves like a SMDP except that the system is influenced by its environment modeled by a semi-Markov process. We transform the SMDP in a semiMarkov environment into an equivalent discrete time Markov decision process under the condition that rewards are all positive or all negative, and obtain the optimality equation and some properties for it.展开更多
This paper attempts to study the optimal stopping time for semi- Markov processes (SMPs) under the discount optimization criteria with unbounded cost rates. In our work, we introduce an explicit construction of the eq...This paper attempts to study the optimal stopping time for semi- Markov processes (SMPs) under the discount optimization criteria with unbounded cost rates. In our work, we introduce an explicit construction of the equivalent semi-Markov decision processes (SMDPs). The equivalence is embodied in the expected discounted cost functions of SMPs and SMDPs, that is, every stopping time of SMPs can induce a policy of SMDPs such that the value functions are equal, and vice versa. The existence of the optimal stopping time of SMPs is proved by this equivalence relation. Next, we give the optimality equation of the value function and develop an effective iterative algorithm for computing it. Moreover, we show that the optimal and ε-optimal stopping time can be characterized by the hitting time of the special sets. Finally, to illustrate the validity of our results, an example of a maintenance system is presented in the end.展开更多
In intelligent transportation system(ITS), the interworking of vehicular networks(VN) and cellular networks(CN) is proposed to provide high-data-rate services to vehicles. As the network access quality for CN and VN i...In intelligent transportation system(ITS), the interworking of vehicular networks(VN) and cellular networks(CN) is proposed to provide high-data-rate services to vehicles. As the network access quality for CN and VN is location related, mobile data offloading(MDO), which dynamically selects access networks for vehicles, should be considered with vehicle route planning to further improve the wireless data throughput of individual vehicles and to enhance the performance of the entire ITS. In this paper, we investigate joint MDO and route selection for an individual vehicle in a metropolitan scenario. We aim to improve the throughput of the target vehicle while guaranteeing its transportation efficiency requirements in terms of traveling time and distance. To achieve this objective, we first formulate the joint route and access network selection problem as a semi-Markov decision process(SMDP). Then we propose an optimal algorithm to calculate its optimal policy. To further reduce the computation complexity, we derive a suboptimal algorithm which reduces the action space. Simulation results demonstrate that the proposed optimal algorithm significantly outperforms the existing work in total throughput and the late arrival ratio.Moreover, the heuristic algorithm is able to substantially reduce the computation time with only slight performance degradation.展开更多
Testing is the premise and foundation of realizing equipment health management (EHM). To address the problem that the static periodic test strategy may cause deficient test or excessive test, a dynamic sequential te...Testing is the premise and foundation of realizing equipment health management (EHM). To address the problem that the static periodic test strategy may cause deficient test or excessive test, a dynamic sequential test strategy (DSTS) for EHM is presented. Considering the situation that equipment health state is not completely observable in reality, a DSTS optimization method based on partially observable semi-Markov decision pro- cess (POSMDP) is proposed. Firstly, an equipment health state degradation model is constructed by Markov process, and the control limit maintenance policy is also introduced. Secondly, POSMDP is formulated in great detail. And then, POSMDP is converted to completely observable belief semi-Markov decision process (BSMDP) through belief state. The optimal equation and the corresponding optimal DSTS, which minimize the long-run ex- pected average cost per unit time, are obtained with BSMDP. The results of application in complex equipment show that the proposed DSTS is feasible and effective.展开更多
This paper presents a Fuzzy Control Model for SHM (Structural Health Monitoring) of civil infrastructure systems. Two important considerations of this model are (a) effective control of structural mechanism to pre...This paper presents a Fuzzy Control Model for SHM (Structural Health Monitoring) of civil infrastructure systems. Two important considerations of this model are (a) effective control of structural mechanism to prevent damage of civil infrastructure systems, and (b) energy-efficient data transmissions. Fuzzy Logic is incorporated into the model to provide (a) capability for handling imprecision and non-statistical uncertainty associated with structural monitoring, and (b) framework for effective control of the mechanism of civil infrastructure systems. Moreover, wireless smart sensors are deployed in the model to measure dynamic response of civil infrastructure systems to structural excitation. The operation of these wireless smart sensors is characterized as discounted SMDP (Semi-Markov Decision Process) consisting of two states, namely: sensing/processing and transmitting/receiving. The objective of the SMDP-based measurement scheme is to choose policy that offers optimal energy-efficient transmission of measured value of vibration-based dynamic response. Depending on the net magnitude of measured dynamic responses to excitation signals, data may (or may not) be transmitted to the Fuzzy control segment for appropriate control of the mechanism of civil infrastructure systems. The efficacy of this model is tested via numerical analysis, which is implemented in MATLAB software. It is shown that this model can provide energy-efficient structural health monitoring and effective control of civil infrastructure systems.展开更多
As intelligent vehicles usually have complex overtaking process,a safe and efficient automated overtaking system(AOS)is vital to avoid accidents caused by wrong operation of drivers.Existing AOSs rarely consider longi...As intelligent vehicles usually have complex overtaking process,a safe and efficient automated overtaking system(AOS)is vital to avoid accidents caused by wrong operation of drivers.Existing AOSs rarely consider longitudinal reactions of the overtaken vehicle(OV)during overtaking.This paper proposed a novel AOS based on hierarchical reinforcement learning,where the longitudinal reaction is given by a data-driven social preference estimation.This AOS incorporates two modules that can function in different overtaking phases.The first module based on semi-Markov decision process and motion primitives is built for motion planning and control.The second module based on Markov decision process is designed to enable vehicles to make proper decisions according to the social preference of OV.Based on realistic overtaking data,the proposed AOS and its modules are verified experimentally.The results of the tests show that the proposed AOS can realize safe and effective overtaking in scenes built by realistic data,and has the ability to flexibly adjust lateral driving behavior and lane changing position when the OVs have different social preferences.展开更多
The multi-robot systems(MRS)exploration and fire searching problem is an important application of mobile robots which require massive computation capability that exceeds the ability of traditional MRS′s.This paper pr...The multi-robot systems(MRS)exploration and fire searching problem is an important application of mobile robots which require massive computation capability that exceeds the ability of traditional MRS′s.This paper propose a cloud-based hybrid decentralized partially observable semi-Markov decision process(HDec-POSMDPs)model.The proposed model is implemented for MRS exploration and fire searching application based on the Internet of things(IoT)cloud robotics framework.In this implementation the heavy and expensive computational tasks are offloaded to the cloud servers.The proposed model achieves a significant improvement in the computation burden of the whole task relative to a traditional MRS.The proposed model is applied to explore and search for fire objects in an unknown environment;using different sets of robots sizes.The preliminary evaluation of this implementation demonstrates that as the parallelism of computational instances increase the delay of new actuation commands which will be decreased,the mean time of task completion is decreased,the number of turns in the path from the start pose cells to the target cells is minimized and the energy consumption for each robot is reduced.展开更多
基金supported by National Natural Science Foundation of China(Grant Nos.61374067 and 11471341)
文摘This paper is the first attempt to investigate the risk probability criterion in semi-Markov decision processes with loss rates. The goal is to find an optimal policy with the minimum risk probability that the total loss incurred during a first passage time to some target set exceeds a loss level. First, we establish the optimality equation via a successive approximation technique, and show that the value function is the unique solution to the optimality equation. Second, we give suitable conditions, under which we prove the existence of optimal policies and develop an algorithm for computing ?-optimal policies. Finally, we apply our main results to a business system.
基金Supported by the Natural Science Foundation of China(No.60874004,60736028)Guangdong Province Universities and Colleges Pearl River Scholar Funded Scheme(2010)
文摘This paper considers a first passage model for discounted semi-Markov decision processes with denumerable states and nonnegative costs. The criterion to be optimized is the expected discounted cost incurred during a first passage time to a given target set. We first construct a semi-Markov decision process under a given semi-Markov decision kernel and a policy. Then, we prove that the value function satisfies the optimality equation and there exists an optimal (or ε-optimal) stationary policy under suitable conditions by using a minimum nonnegative solution approach. Further we give some properties of optimal policies. In addition, a value iteration algorithm for computing the value function and optimal policies is developed and an example is given. Finally, it is showed that our model is an extension of the first passage models for both discrete-time and continuous-time Markov decision processes.
基金the National Natural science Foundation of China (No. 71701008) for supporting this research
文摘For critical engineering systems such as aircraft and aerospace vehicles, accurate Remaining Useful Life(RUL) prediction not only means cost saving, but more importantly, is of great significance in ensuring system reliability and preventing disaster. RUL is affected not only by a system's intrinsic deterioration, but also by the operational conditions under which the system is operating. This paper proposes an RUL prediction approach to estimate the mean RUL of a continuously degrading system under dynamic operational conditions and subjected to condition monitoring at short equi-distant intervals. The dynamic nature of the operational conditions is described by a discrete-time Markov chain, and their influences on the degradation signal are quantified by degradation rates and signal jumps in the degradation model. The uniqueness of our proposed approach is formulating the RUL prediction problem in a semi-Markov decision process framework, by which the system mean RUL can be obtained through the solution to a limited number of equations. To extend the use of our proposed approach in real applications, different failure standards according to different operational conditions are also considered. The application and effectiveness of this approach are illustrated by a turbofan engine dataset and a comparison with existing results for the same dataset.
文摘This paper investigates the Borel state space semi-Markov decision process (SMDP) with the criterion of expected total rewards in a semi-Markov environment. It describes a system which behaves like a SMDP except that the system is influenced by its environment modeled by a semi-Markov process. We transform the SMDP in a semiMarkov environment into an equivalent discrete time Markov decision process under the condition that rewards are all positive or all negative, and obtain the optimality equation and some properties for it.
基金This work was supported in part by the National Natural Science Foundation of China(Grant Nos.11931018,61773411,11701588,11961005)the Guangdong Basic and Applied Basic Research Foundation(Grant No.2020B1515310021).
文摘This paper attempts to study the optimal stopping time for semi- Markov processes (SMPs) under the discount optimization criteria with unbounded cost rates. In our work, we introduce an explicit construction of the equivalent semi-Markov decision processes (SMDPs). The equivalence is embodied in the expected discounted cost functions of SMPs and SMDPs, that is, every stopping time of SMPs can induce a policy of SMDPs such that the value functions are equal, and vice versa. The existence of the optimal stopping time of SMPs is proved by this equivalence relation. Next, we give the optimality equation of the value function and develop an effective iterative algorithm for computing it. Moreover, we show that the optimal and ε-optimal stopping time can be characterized by the hitting time of the special sets. Finally, to illustrate the validity of our results, an example of a maintenance system is presented in the end.
基金Supported by the National Natural Science Foundation of China(61673019,61773411,11931018,62073346)the Guangdong Province Key Laboratory of Computational Science at the Sun Yat-sen University(2020B1212060032)the Guangdong Basic and Applied Basic Research Foundation(2021A1515010057,2021A1515011984)。
基金the National Natural Science Foundation of China under Grants 61631005 and U1801261the National Key R&D Program of China under Grant 2018YFB1801105+3 种基金the Central Universities under Grant ZYGX2019Z022the Key Areas of Research and Development Program of Guangdong Province, China, under Grant 2018B010114001the 111 Project under Grant B20064the China Postdoctoral Science Foundation under Grant No. 2018M631075
文摘In intelligent transportation system(ITS), the interworking of vehicular networks(VN) and cellular networks(CN) is proposed to provide high-data-rate services to vehicles. As the network access quality for CN and VN is location related, mobile data offloading(MDO), which dynamically selects access networks for vehicles, should be considered with vehicle route planning to further improve the wireless data throughput of individual vehicles and to enhance the performance of the entire ITS. In this paper, we investigate joint MDO and route selection for an individual vehicle in a metropolitan scenario. We aim to improve the throughput of the target vehicle while guaranteeing its transportation efficiency requirements in terms of traveling time and distance. To achieve this objective, we first formulate the joint route and access network selection problem as a semi-Markov decision process(SMDP). Then we propose an optimal algorithm to calculate its optimal policy. To further reduce the computation complexity, we derive a suboptimal algorithm which reduces the action space. Simulation results demonstrate that the proposed optimal algorithm significantly outperforms the existing work in total throughput and the late arrival ratio.Moreover, the heuristic algorithm is able to substantially reduce the computation time with only slight performance degradation.
基金supported by the National Natural Science Foundation of China (51175502)
文摘Testing is the premise and foundation of realizing equipment health management (EHM). To address the problem that the static periodic test strategy may cause deficient test or excessive test, a dynamic sequential test strategy (DSTS) for EHM is presented. Considering the situation that equipment health state is not completely observable in reality, a DSTS optimization method based on partially observable semi-Markov decision pro- cess (POSMDP) is proposed. Firstly, an equipment health state degradation model is constructed by Markov process, and the control limit maintenance policy is also introduced. Secondly, POSMDP is formulated in great detail. And then, POSMDP is converted to completely observable belief semi-Markov decision process (BSMDP) through belief state. The optimal equation and the corresponding optimal DSTS, which minimize the long-run ex- pected average cost per unit time, are obtained with BSMDP. The results of application in complex equipment show that the proposed DSTS is feasible and effective.
文摘This paper presents a Fuzzy Control Model for SHM (Structural Health Monitoring) of civil infrastructure systems. Two important considerations of this model are (a) effective control of structural mechanism to prevent damage of civil infrastructure systems, and (b) energy-efficient data transmissions. Fuzzy Logic is incorporated into the model to provide (a) capability for handling imprecision and non-statistical uncertainty associated with structural monitoring, and (b) framework for effective control of the mechanism of civil infrastructure systems. Moreover, wireless smart sensors are deployed in the model to measure dynamic response of civil infrastructure systems to structural excitation. The operation of these wireless smart sensors is characterized as discounted SMDP (Semi-Markov Decision Process) consisting of two states, namely: sensing/processing and transmitting/receiving. The objective of the SMDP-based measurement scheme is to choose policy that offers optimal energy-efficient transmission of measured value of vibration-based dynamic response. Depending on the net magnitude of measured dynamic responses to excitation signals, data may (or may not) be transmitted to the Fuzzy control segment for appropriate control of the mechanism of civil infrastructure systems. The efficacy of this model is tested via numerical analysis, which is implemented in MATLAB software. It is shown that this model can provide energy-efficient structural health monitoring and effective control of civil infrastructure systems.
基金The authors would like to appreciate the financial support of the National Natural Science Foundation of China(Grant No.61703041)the technological innovation program of Beijing Institute of Technology(2021CX11006).
文摘As intelligent vehicles usually have complex overtaking process,a safe and efficient automated overtaking system(AOS)is vital to avoid accidents caused by wrong operation of drivers.Existing AOSs rarely consider longitudinal reactions of the overtaken vehicle(OV)during overtaking.This paper proposed a novel AOS based on hierarchical reinforcement learning,where the longitudinal reaction is given by a data-driven social preference estimation.This AOS incorporates two modules that can function in different overtaking phases.The first module based on semi-Markov decision process and motion primitives is built for motion planning and control.The second module based on Markov decision process is designed to enable vehicles to make proper decisions according to the social preference of OV.Based on realistic overtaking data,the proposed AOS and its modules are verified experimentally.The results of the tests show that the proposed AOS can realize safe and effective overtaking in scenes built by realistic data,and has the ability to flexibly adjust lateral driving behavior and lane changing position when the OVs have different social preferences.
基金Corresponding au-thor:Ayman El Shenawy received the Ph.D.degree in systems and computer engineer-ing from Al-Azhar University,Egypt in 2013.He is currently working as a lecturer at Systems and Computers Engineering Department,Faculty of Engineering Al-Azhar University,Egypt.He already de-veloped some breakthrough research in the mentioned areas.He made significant con-tributions to the stated research fields.His research interests include artificial intelligent methods,robotics and machine learning.E-mail:eaymanelshenawy@azhar.edu.eg ORCID iD:0000-0002-1309-644。
文摘The multi-robot systems(MRS)exploration and fire searching problem is an important application of mobile robots which require massive computation capability that exceeds the ability of traditional MRS′s.This paper propose a cloud-based hybrid decentralized partially observable semi-Markov decision process(HDec-POSMDPs)model.The proposed model is implemented for MRS exploration and fire searching application based on the Internet of things(IoT)cloud robotics framework.In this implementation the heavy and expensive computational tasks are offloaded to the cloud servers.The proposed model achieves a significant improvement in the computation burden of the whole task relative to a traditional MRS.The proposed model is applied to explore and search for fire objects in an unknown environment;using different sets of robots sizes.The preliminary evaluation of this implementation demonstrates that as the parallelism of computational instances increase the delay of new actuation commands which will be decreased,the mean time of task completion is decreased,the number of turns in the path from the start pose cells to the target cells is minimized and the energy consumption for each robot is reduced.