An optimal vibration control strategy for partially observable nonlinear quasi Hamiltonian systems with actuator saturation is proposed. First,a controlled partially observable non-linear system is converted into a co...An optimal vibration control strategy for partially observable nonlinear quasi Hamiltonian systems with actuator saturation is proposed. First,a controlled partially observable non-linear system is converted into a completely observable linear control system of finite dimension based on the theorem due to Charalambous and Elliott. Then the partially averaged It stochastic differential equations and dynamical programming equation associated with the completely observable linear system are derived by using the stochastic averaging method and stochastic dynamical programming principle,respectively. The optimal control law is obtained from solving the final dynamical programming equation. The results show that the proposed control strategy has high control effectiveness and control effciency.展开更多
For active distribution networks(ADNs)integrated with massive inverter-based energy resources,it is impractical to maintain the accurate model and deploy measurements at all nodes due to the large-scale of ADNs.Thus,c...For active distribution networks(ADNs)integrated with massive inverter-based energy resources,it is impractical to maintain the accurate model and deploy measurements at all nodes due to the large-scale of ADNs.Thus,current models of ADNs usually involve significant errors or even unknown occurances.Moreover,ADNs are usually partially observable since only a few measurements are available at pilot nodes or nodes with significant users.To provide a practical Volt/Var control(VVC)strategy for such networks,a data-driven VVC method is proposed in this paper.First,the system response policy,approximating the relationship between the control variables and states of monitoring nodes,is estimated by a recursive regression closed-form solution.Then,based on real-time measurements and the newly updated system response policy,a VVC strategy with convergence guarantee is realized.Since the recursive regression solution is embedded in the control stage,a data-driven closedloop VVC framework is established.The effectiveness of the proposed method is validated in an unbalanced distribution system considering nonlinear loads,where not only the rapid and self-adaptive voltage regulation is realized,but also systemwide optimization is achieved.展开更多
The recent progress in multi-agent deep reinforcement learning(MADRL)makes it more practical in real-world tasks,but its relatively poor scalability and the partially observable constraint raise more challenges for it...The recent progress in multi-agent deep reinforcement learning(MADRL)makes it more practical in real-world tasks,but its relatively poor scalability and the partially observable constraint raise more challenges for its performance and deployment.Based on our intuitive observation that human society could be regarded as a large-scale partially observable environment,where everyone has the functions of communicating with neighbors and remembering his/her own experience,we propose a novel network structure called the hierarchical graph recurrent network(HGRN)for multi-agent cooperation under partial observability.Specifically,we construct the multiagent system as a graph,use a novel graph convolution structure to achieve communication between heterogeneous neighboring agents,and adopt a recurrent unit to enable agents to record historical information.To encourage exploration and improve robustness,we design a maximum-entropy learning method that can learn stochastic policies of a configurable target action entropy.Based on the above technologies,we propose a value-based MADRL algorithm called Soft-HGRN and its actor-critic variant called SAC-HGRN.Experimental results based on three homogeneous tasks and one heterogeneous environment not only show that our approach achieves clear improvements compared with four MADRL baselines,but also demonstrate the interpretability,scalability,and transferability of the proposed model.展开更多
We present a numerical approach for modeling unknown dynamical systems using partially observed data,with a focus on biological systems with(relatively)complex dynamical behavior.As an extension of the recently develo...We present a numerical approach for modeling unknown dynamical systems using partially observed data,with a focus on biological systems with(relatively)complex dynamical behavior.As an extension of the recently developed deep neural network(DNN)learning methods,our approach is particularly suitable for practical situations when(i)measurement data are available for only a subset of the state variables,and(ii)the system parameters cannot be observed or measured at all.We demonstrate that,with a properly designed DNN structure with memory terms,effective DNN models can be learned from such partially observed data containing hidden parameters.The learned DNN model serves as an accurate predictive tool for system analysis.Through a few representative biological problems,we demonstrate that such DNN models can capture qualitative dynamical behavior changes in the system,such as bifurcations,even when the parameters controlling such behavior changes are completely unknown throughout not only the model learning process but also the system prediction process.The learned DNN model effectively creates a“closed”model involving only the observables when such a closed-form model does not exist mathematically.展开更多
This paper considers a linear-quadratic(LQ) meanfield game governed by a forward-backward stochastic system with partial observation and common noise,where a coupling structure enters state equations,cost functionals ...This paper considers a linear-quadratic(LQ) meanfield game governed by a forward-backward stochastic system with partial observation and common noise,where a coupling structure enters state equations,cost functionals and observation equations.Firstly,to reduce the complexity of solving the meanfield game,a limiting control problem is introduced.By virtue of the decomposition approach,an admissible control set is proposed.Applying a filter technique and dimensional-expansion technique,a decentralized control strategy and a consistency condition system are derived,and the related solvability is also addressed.Secondly,we discuss an approximate Nash equilibrium property of the decentralized control strategy.Finally,we work out a financial problem with some numerical simulations.展开更多
This paper investigates the fault detection problem for discrete event systems (DESs) which can be modeled by partially observed Petri nets (POPNs). To overcome the problem of low diagnosability in the POPN online fau...This paper investigates the fault detection problem for discrete event systems (DESs) which can be modeled by partially observed Petri nets (POPNs). To overcome the problem of low diagnosability in the POPN online fault diagnoser in current use, an improved online fault diagnosis algorithm that integrates generalized mutual exclusion constraints (GMECs) and integer linear programming (ILP) is proposed. Assume that the POPN structure and its initial markings are known, and the faults are modeled as unobservable transitions. First, the event sequence is observed and recorded. GMEC is used for elementary diagnosis of the system behavior, then the ILP problem of POPN is solved for further diagnosis. Finally, an example of a real DES to test the new fault diagnoser is analyzed. The proposed algorithm increases the diagnosability of the DES remarkably, and the effectiveness of the new algorithm integrating GMEC and ILP is verified.展开更多
I consider a system whose deterioration follows a discrete-time and discrete-state Markov chain with an absorbing state. When the system is put into practice, I may select operation (wait), imperfect repair, or replac...I consider a system whose deterioration follows a discrete-time and discrete-state Markov chain with an absorbing state. When the system is put into practice, I may select operation (wait), imperfect repair, or replacement at each discrete-time point. The true state of the system is not known when it is operated. Instead, the system is monitored after operation and some incomplete information concerned with the deterioration is obtained for decision making. Since there are multiple imperfect repairs, I can select one option from them when the imperfect repair is preferable to operation and replacement. To express this situation, I propose a POMDP model and theoretically investigate the structure of an optimal maintenance policy minimizing a total expected discounted cost for an unbounded horizon. Then two stochastic orders are used for the analysis of our problem.展开更多
Testing is the premise and foundation of realizing equipment health management (EHM). To address the problem that the static periodic test strategy may cause deficient test or excessive test, a dynamic sequential te...Testing is the premise and foundation of realizing equipment health management (EHM). To address the problem that the static periodic test strategy may cause deficient test or excessive test, a dynamic sequential test strategy (DSTS) for EHM is presented. Considering the situation that equipment health state is not completely observable in reality, a DSTS optimization method based on partially observable semi-Markov decision pro- cess (POSMDP) is proposed. Firstly, an equipment health state degradation model is constructed by Markov process, and the control limit maintenance policy is also introduced. Secondly, POSMDP is formulated in great detail. And then, POSMDP is converted to completely observable belief semi-Markov decision process (BSMDP) through belief state. The optimal equation and the corresponding optimal DSTS, which minimize the long-run ex- pected average cost per unit time, are obtained with BSMDP. The results of application in complex equipment show that the proposed DSTS is feasible and effective.展开更多
A feedback control optimization method of partially observable linear structures via stationary response is proposed and analyzed with linear building structures equipped with control devices and sensors. First, the p...A feedback control optimization method of partially observable linear structures via stationary response is proposed and analyzed with linear building structures equipped with control devices and sensors. First, the partially observable control problem of the structure under horizontal ground acceleration excitation is converted into a completely observable control problem. Then the It6 stochastic differential equations of the system are derived based on the stochastic averaging method for quasi-integrable Hamiltonian systems and the stationary solution to the Fokker-Plank-Kolmogorov (FPK) equation associated with the It6 equations is obtained. The performance index in terms of the mean system energy and mean square control force is established and the optimal control force is obtained by minimizing the performance index. Finally, the numerical results for a three-story building structure model under E1 Centro, Hachinohe, Northridge and Kobe earthquake excitations are given to illustrate the application and the effectiveness of the proposed method.展开更多
Wireless body area networks (WBANs) can provide low-cost, timely healthcare services and are expected to be widely used for e-healthcare in hospitals. In a hospital, space is often limited and multiple WBANs have to...Wireless body area networks (WBANs) can provide low-cost, timely healthcare services and are expected to be widely used for e-healthcare in hospitals. In a hospital, space is often limited and multiple WBANs have to coexist in an area and share the same channel in order to provide healthcare services to different patients. This causes severe interference between WBANs that could significantly reduce the network throughput and increase the amount of power consumed by sensors placed on the body. There-fore, an efficient channel-resource allocation scheme in the medium access control (MAC) layer is crucial. In this paper, we devel-op a centralized MAC layer resource allocation scheme for a WBAN. We focus on mitigating the interference between WBANs and reducing the power consumed by sensors. Channel and buffer state are reported by smartphones deployed in each WBAN, and channel access allocation is performed by a central controller to maximize network throughput. Sensors have strict limitations in terms of energy consumption and computing capability and cannot provide all the necessary information for channel allocation in a timely manner. This deteriorates network performance. We exploit the temporal correlation of the body area channel in order to minimize the number of channel state reports necessary. We view the network design as a partly observable optimization prob-lem and develop a myopic policy, which we then simulate in Matlab.展开更多
Multiple earth observing satellites need to communicate with each other to observe plenty of targets on the Earth together. The factors, such as external interference, result in satellite information interaction delay...Multiple earth observing satellites need to communicate with each other to observe plenty of targets on the Earth together. The factors, such as external interference, result in satellite information interaction delays, which is unable to ensure the integrity and timeliness of the information on decision making for satellites. And the optimization of the planning result is affected. Therefore, the effect of communication delay is considered during the multi-satel ite coordinating process. For this problem, firstly, a distributed cooperative optimization problem for multiple satellites in the delayed communication environment is formulized. Secondly, based on both the analysis of the temporal sequence of tasks in a single satellite and the dynamically decoupled characteristics of the multi-satellite system, the environment information of multi-satellite distributed cooperative optimization is constructed on the basis of the directed acyclic graph(DAG). Then, both a cooperative optimization decision making framework and a model are built according to the decentralized partial observable Markov decision process(DEC-POMDP). After that, a satellite coordinating strategy aimed at different conditions of communication delay is mainly analyzed, and a unified processing strategy on communication delay is designed. An approximate cooperative optimization algorithm based on simulated annealing is proposed. Finally, the effectiveness and robustness of the method presented in this paper are verified via the simulation.展开更多
This paper examines the effect of the observation time on source identification of a discrete-time susceptible-infectedrecovered diffusion process in a network with snapshot of partial nodes.We formulate the source id...This paper examines the effect of the observation time on source identification of a discrete-time susceptible-infectedrecovered diffusion process in a network with snapshot of partial nodes.We formulate the source identification problem as a maximum likelihood(ML)estimator and develop a statistical inference method based on Monte Carlo simulation(MCS)to estimate the source location and the initial time of diffusion.Experimental results in synthetic networks and real-world networks demonstrate evident impact of the observation time as well as the fraction of the observers on the concerned problem.展开更多
Sufficient conditions are given for the existence of unknown and partially known input normal and descriptor observers for a class of descriptor discrete time networked control systems.It is shown that a causal and re...Sufficient conditions are given for the existence of unknown and partially known input normal and descriptor observers for a class of descriptor discrete time networked control systems.It is shown that a causal and regular descriptor system subjected to input and output periodic communication constraints,can be down sampled into a causal and regular p-lifted time invariant system.According to the lifted formulation,interesting results on minimum and maximum feasible values for communication sequence periods are drawn for the existence of an unknown or partially known input observer.The case of partially known input observer,cover unknown input case as an extreme case.An example is given for clarification.展开更多
Decision-making for autonomous vehicles in the presence of obstacle occlusions is difficult because the lack of accurate information affects the judgment.Existing methods may lead to overly conservative strategies and...Decision-making for autonomous vehicles in the presence of obstacle occlusions is difficult because the lack of accurate information affects the judgment.Existing methods may lead to overly conservative strategies and timeconsuming computations that cannot be balanced with efficiency.We propose to use distributional reinforcement learning to hedge the risk of strategies,optimize the worse cases,and improve the efficiency of the algorithm so that the agent learns better actions.A batch of smaller values is used to replace the average value to optimize the worse case,and combined with frame stacking,we call it Efficient-Fully parameterized Quantile Function(EFQF).This model is used to evaluate signal-free intersection crossing scenarios and makes more efficient moves and reduces the collision rate compared to conventional reinforcement learning algorithms in the presence of perceived occlusion.The model also has robustness in the case of data loss compared to the method with embedded long and short term memory.展开更多
In network service systems, satisfying quality of service (QoS) is one of the main objectives. Admission control and resource allocation strategy can be used to guarantee the QoS requirement. Based on partially observ...In network service systems, satisfying quality of service (QoS) is one of the main objectives. Admission control and resource allocation strategy can be used to guarantee the QoS requirement. Based on partially observable Markov decision processes (POMDPs), this paper proposes a novel admission control model for video on demand (VOD) service systems with elastic QoS. Elastic QoS is also considered in resource allocation strategy. Policy gradient algorithm is often available to find the solution of POMDP problems, with a satisfactory convergence rate. Through numerical examples, it can be shown that the proposed admission control strategy has better performance than complete admission control strategy.展开更多
The increasing demands in terms of high data rate and quality of services over the hybrid satellite-terrestrial relay networks(HSTRN)have pushed for the development of millimeter-wave(mmWave)band high-throughput satel...The increasing demands in terms of high data rate and quality of services over the hybrid satellite-terrestrial relay networks(HSTRN)have pushed for the development of millimeter-wave(mmWave)band high-throughput satellites(HTS)with multibeams.The next generation of mmWave multibeam HTS communication systems(HTSCS)is viewed as the backbone network to enhance the throughput of the HSTRN.The article first investigates the basic backbone topology architecture of HTSCS,and an M-state Markov channel for the Ka/Q/V band mmWave systems is reviewed.Then,we propose a long-term optimal power allocation scheme over two in-dependent and identical spot beams based on the partially observable Markov decision process(POMDP),which can partly mitigate the negative effects of severe weather conditions.The key conditions for selecting the optimal power allocation action in the multibeam HTSCS are given.Simulation results show that our POMDP-based power allocation scheme can enhance the long-term throughput of the HTSCS.展开更多
In order to solve the sensing and motion uncertainty problem of motion planning in narrow passage environment,a partition sampling strategy based on partially observable Markov decision process(POMDP)was proposed.The ...In order to solve the sensing and motion uncertainty problem of motion planning in narrow passage environment,a partition sampling strategy based on partially observable Markov decision process(POMDP)was proposed.The method combines partition sampling strategy and can improve the success rate of the robot motion planning in the narrow passage.Firstly,the environment is divided into open area and narrow area by using a partition sampling strategy,and generates the initial trajectory of the robot with fewer sampling points.Secondly,the method can calculate a local optimal solution of the initial nominal trajectory by solving POMDP problem,and iterates an overall optimal trajectory of robot motion.The proposed method follows the general POMDP solution framework,in which the belief dynamics is approximated by an extended Kalman filter(EKF),and the value function is represented by an effective quadratic function in the belief space near the nominal trajectory.Using a belief space variant of iterative linear quadratic Gaussian(iLQG)to perform the value iteration,which results in a linear control policy over the belief space that is locally optimal around the nominal trajectory.A new nominal trajectory is generated by executing the control strategy iteration,and the process is repeated until it converges to a locally optimal solution.Finally,the robot gets the optimal trajectory to safely pass through a narrow passage.The experimental results show that the proposed method can efficiently improves the performance of motion planning under uncertainty.展开更多
Purpose-The purpose of this paper is to establish a version of a theorem that originated from population genetics and has been later adopted in evolutionary computation theory that will lead to novel Monte-Carlo sampl...Purpose-The purpose of this paper is to establish a version of a theorem that originated from population genetics and has been later adopted in evolutionary computation theory that will lead to novel Monte-Carlo sampling algorithms that provably increase the AI potential.Design/methodology/approach-In the current paper the authors set up a mathematical framework,state and prove a version of a Geiringer-like theorem that is very well-suited for the development of Mote-Carlo sampling algorithms to cope with randomness and incomplete information to make decisions.Findings-This work establishes an important theoretical link between classical population genetics,evolutionary computation theory and model free reinforcement learning methodology.Not only may the theory explain the success of the currently existing Monte-Carlo tree sampling methodology,but it also leads to the development of novel Monte-Carlo sampling techniques guided by rigorous mathematical foundation.Practical implications-The theoretical foundations established in the current work provide guidance for the design of powerful Monte-Carlo sampling algorithms in model free reinforcement learning,to tackle numerous problems in computational intelligence.Originality/value-Establishing a Geiringer-like theorem with non-homologous recombination was a long-standing open problem in evolutionary computation theory.Apart from overcoming this challenge,in a mathematically elegant fashion and establishing a rather general and powerful version of the theorem,this work leads directly to the development of novel provably powerful algorithms for decision making in the environment involving randomness,hidden or incomplete information.展开更多
The multi-robot systems(MRS)exploration and fire searching problem is an important application of mobile robots which require massive computation capability that exceeds the ability of traditional MRS′s.This paper pr...The multi-robot systems(MRS)exploration and fire searching problem is an important application of mobile robots which require massive computation capability that exceeds the ability of traditional MRS′s.This paper propose a cloud-based hybrid decentralized partially observable semi-Markov decision process(HDec-POSMDPs)model.The proposed model is implemented for MRS exploration and fire searching application based on the Internet of things(IoT)cloud robotics framework.In this implementation the heavy and expensive computational tasks are offloaded to the cloud servers.The proposed model achieves a significant improvement in the computation burden of the whole task relative to a traditional MRS.The proposed model is applied to explore and search for fire objects in an unknown environment;using different sets of robots sizes.The preliminary evaluation of this implementation demonstrates that as the parallelism of computational instances increase the delay of new actuation commands which will be decreased,the mean time of task completion is decreased,the number of turns in the path from the start pose cells to the target cells is minimized and the energy consumption for each robot is reduced.展开更多
This paper examines how independent directors’social capital,as measured by their social network,affects corporate fraud.We find that firms with wellconnected independent directors are less likely to commit fraud,sup...This paper examines how independent directors’social capital,as measured by their social network,affects corporate fraud.We find that firms with wellconnected independent directors are less likely to commit fraud,supporting our monitoring effect hypothesis.This result is robust to a battery of tests.Further analyses show that the effect is stronger for firms with a relatively poor legal environment,for firms whose independent directors face strong reputation incentives and when independent directors are audit committee members.Moreover,we explore a potential economic mechanism of the effect and observe that well-connected independent directors are associated with less absenteeism and more dissension.Overall,our findings suggest that independent directors’social capital plays an important role in corporate governance.展开更多
基金supported by the National Natural Science Foundation of China (Nos. 10332030 and 10772159)Research Fund for doctoral Program of Higher Education of China (No. 20060335125)
文摘An optimal vibration control strategy for partially observable nonlinear quasi Hamiltonian systems with actuator saturation is proposed. First,a controlled partially observable non-linear system is converted into a completely observable linear control system of finite dimension based on the theorem due to Charalambous and Elliott. Then the partially averaged It stochastic differential equations and dynamical programming equation associated with the completely observable linear system are derived by using the stochastic averaging method and stochastic dynamical programming principle,respectively. The optimal control law is obtained from solving the final dynamical programming equation. The results show that the proposed control strategy has high control effectiveness and control effciency.
基金supported by the Research Project of China Southern Power Grid Corporation:The demonstration and application of the virtual power plant intelligent operation and management platform with source-grid coordination,No.GDKJXM20185069 (032000KK 52180069)。
文摘For active distribution networks(ADNs)integrated with massive inverter-based energy resources,it is impractical to maintain the accurate model and deploy measurements at all nodes due to the large-scale of ADNs.Thus,current models of ADNs usually involve significant errors or even unknown occurances.Moreover,ADNs are usually partially observable since only a few measurements are available at pilot nodes or nodes with significant users.To provide a practical Volt/Var control(VVC)strategy for such networks,a data-driven VVC method is proposed in this paper.First,the system response policy,approximating the relationship between the control variables and states of monitoring nodes,is estimated by a recursive regression closed-form solution.Then,based on real-time measurements and the newly updated system response policy,a VVC strategy with convergence guarantee is realized.Since the recursive regression solution is embedded in the control stage,a data-driven closedloop VVC framework is established.The effectiveness of the proposed method is validated in an unbalanced distribution system considering nonlinear loads,where not only the rapid and self-adaptive voltage regulation is realized,but also systemwide optimization is achieved.
基金Project supported by the National Key R&D Program of China(No.2018AAA010230)。
文摘The recent progress in multi-agent deep reinforcement learning(MADRL)makes it more practical in real-world tasks,but its relatively poor scalability and the partially observable constraint raise more challenges for its performance and deployment.Based on our intuitive observation that human society could be regarded as a large-scale partially observable environment,where everyone has the functions of communicating with neighbors and remembering his/her own experience,we propose a novel network structure called the hierarchical graph recurrent network(HGRN)for multi-agent cooperation under partial observability.Specifically,we construct the multiagent system as a graph,use a novel graph convolution structure to achieve communication between heterogeneous neighboring agents,and adopt a recurrent unit to enable agents to record historical information.To encourage exploration and improve robustness,we design a maximum-entropy learning method that can learn stochastic policies of a configurable target action entropy.Based on the above technologies,we propose a value-based MADRL algorithm called Soft-HGRN and its actor-critic variant called SAC-HGRN.Experimental results based on three homogeneous tasks and one heterogeneous environment not only show that our approach achieves clear improvements compared with four MADRL baselines,but also demonstrate the interpretability,scalability,and transferability of the proposed model.
基金supported by the NSF(No.DMS-1813071)(Chou)and the AFSOR(No.FA9550-22-1-0011)(Xiu).
文摘We present a numerical approach for modeling unknown dynamical systems using partially observed data,with a focus on biological systems with(relatively)complex dynamical behavior.As an extension of the recently developed deep neural network(DNN)learning methods,our approach is particularly suitable for practical situations when(i)measurement data are available for only a subset of the state variables,and(ii)the system parameters cannot be observed or measured at all.We demonstrate that,with a properly designed DNN structure with memory terms,effective DNN models can be learned from such partially observed data containing hidden parameters.The learned DNN model serves as an accurate predictive tool for system analysis.Through a few representative biological problems,we demonstrate that such DNN models can capture qualitative dynamical behavior changes in the system,such as bifurcations,even when the parameters controlling such behavior changes are completely unknown throughout not only the model learning process but also the system prediction process.The learned DNN model effectively creates a“closed”model involving only the observables when such a closed-form model does not exist mathematically.
基金supported by the National Key Research and Development Program of China(2022YFA1006103,2023YFA1009203)the National Natural Science Foundation of China(61925306,61821004,11831010,61977043,12001320)+2 种基金the Natural Science Foundation of Shandong Province(ZR2019ZD42,ZR2020ZD24)the Taishan Scholars Young Program of Shandong(TSQN202211032)the Young Scholars Program of Shandong University。
文摘This paper considers a linear-quadratic(LQ) meanfield game governed by a forward-backward stochastic system with partial observation and common noise,where a coupling structure enters state equations,cost functionals and observation equations.Firstly,to reduce the complexity of solving the meanfield game,a limiting control problem is introduced.By virtue of the decomposition approach,an admissible control set is proposed.Applying a filter technique and dimensional-expansion technique,a decentralized control strategy and a consistency condition system are derived,and the related solvability is also addressed.Secondly,we discuss an approximate Nash equilibrium property of the decentralized control strategy.Finally,we work out a financial problem with some numerical simulations.
基金supported by the National Natural Science Foundation of China(61473144)
文摘This paper investigates the fault detection problem for discrete event systems (DESs) which can be modeled by partially observed Petri nets (POPNs). To overcome the problem of low diagnosability in the POPN online fault diagnoser in current use, an improved online fault diagnosis algorithm that integrates generalized mutual exclusion constraints (GMECs) and integer linear programming (ILP) is proposed. Assume that the POPN structure and its initial markings are known, and the faults are modeled as unobservable transitions. First, the event sequence is observed and recorded. GMEC is used for elementary diagnosis of the system behavior, then the ILP problem of POPN is solved for further diagnosis. Finally, an example of a real DES to test the new fault diagnoser is analyzed. The proposed algorithm increases the diagnosability of the DES remarkably, and the effectiveness of the new algorithm integrating GMEC and ILP is verified.
文摘I consider a system whose deterioration follows a discrete-time and discrete-state Markov chain with an absorbing state. When the system is put into practice, I may select operation (wait), imperfect repair, or replacement at each discrete-time point. The true state of the system is not known when it is operated. Instead, the system is monitored after operation and some incomplete information concerned with the deterioration is obtained for decision making. Since there are multiple imperfect repairs, I can select one option from them when the imperfect repair is preferable to operation and replacement. To express this situation, I propose a POMDP model and theoretically investigate the structure of an optimal maintenance policy minimizing a total expected discounted cost for an unbounded horizon. Then two stochastic orders are used for the analysis of our problem.
基金supported by the National Natural Science Foundation of China (51175502)
文摘Testing is the premise and foundation of realizing equipment health management (EHM). To address the problem that the static periodic test strategy may cause deficient test or excessive test, a dynamic sequential test strategy (DSTS) for EHM is presented. Considering the situation that equipment health state is not completely observable in reality, a DSTS optimization method based on partially observable semi-Markov decision pro- cess (POSMDP) is proposed. Firstly, an equipment health state degradation model is constructed by Markov process, and the control limit maintenance policy is also introduced. Secondly, POSMDP is formulated in great detail. And then, POSMDP is converted to completely observable belief semi-Markov decision process (BSMDP) through belief state. The optimal equation and the corresponding optimal DSTS, which minimize the long-run ex- pected average cost per unit time, are obtained with BSMDP. The results of application in complex equipment show that the proposed DSTS is feasible and effective.
基金Project supported by the National Natural Science Foundation of China under a key grant (No.10332030)the Research Fund for the Doctoral Program of Higher Education of China (No.20060335125)the Zhejiang Provincial Natural Science Foundation of China (No.Y607087).
文摘A feedback control optimization method of partially observable linear structures via stationary response is proposed and analyzed with linear building structures equipped with control devices and sensors. First, the partially observable control problem of the structure under horizontal ground acceleration excitation is converted into a completely observable control problem. Then the It6 stochastic differential equations of the system are derived based on the stochastic averaging method for quasi-integrable Hamiltonian systems and the stationary solution to the Fokker-Plank-Kolmogorov (FPK) equation associated with the It6 equations is obtained. The performance index in terms of the mean system energy and mean square control force is established and the optimal control force is obtained by minimizing the performance index. Finally, the numerical results for a three-story building structure model under E1 Centro, Hachinohe, Northridge and Kobe earthquake excitations are given to illustrate the application and the effectiveness of the proposed method.
基金supported by a research grant from the Natural Science and Engineering Research Council(NSERC)under grant No.CRDPJ 419147-11Care In Motion Inc.,Canada
文摘Wireless body area networks (WBANs) can provide low-cost, timely healthcare services and are expected to be widely used for e-healthcare in hospitals. In a hospital, space is often limited and multiple WBANs have to coexist in an area and share the same channel in order to provide healthcare services to different patients. This causes severe interference between WBANs that could significantly reduce the network throughput and increase the amount of power consumed by sensors placed on the body. There-fore, an efficient channel-resource allocation scheme in the medium access control (MAC) layer is crucial. In this paper, we devel-op a centralized MAC layer resource allocation scheme for a WBAN. We focus on mitigating the interference between WBANs and reducing the power consumed by sensors. Channel and buffer state are reported by smartphones deployed in each WBAN, and channel access allocation is performed by a central controller to maximize network throughput. Sensors have strict limitations in terms of energy consumption and computing capability and cannot provide all the necessary information for channel allocation in a timely manner. This deteriorates network performance. We exploit the temporal correlation of the body area channel in order to minimize the number of channel state reports necessary. We view the network design as a partly observable optimization prob-lem and develop a myopic policy, which we then simulate in Matlab.
基金supported by the National Science Foundation for Young Scholars of China(6130123471401175)
文摘Multiple earth observing satellites need to communicate with each other to observe plenty of targets on the Earth together. The factors, such as external interference, result in satellite information interaction delays, which is unable to ensure the integrity and timeliness of the information on decision making for satellites. And the optimization of the planning result is affected. Therefore, the effect of communication delay is considered during the multi-satel ite coordinating process. For this problem, firstly, a distributed cooperative optimization problem for multiple satellites in the delayed communication environment is formulized. Secondly, based on both the analysis of the temporal sequence of tasks in a single satellite and the dynamically decoupled characteristics of the multi-satellite system, the environment information of multi-satellite distributed cooperative optimization is constructed on the basis of the directed acyclic graph(DAG). Then, both a cooperative optimization decision making framework and a model are built according to the decentralized partial observable Markov decision process(DEC-POMDP). After that, a satellite coordinating strategy aimed at different conditions of communication delay is mainly analyzed, and a unified processing strategy on communication delay is designed. An approximate cooperative optimization algorithm based on simulated annealing is proposed. Finally, the effectiveness and robustness of the method presented in this paper are verified via the simulation.
基金the National Natural Science Foundation of China(Grant Nos.61673027 and 62106047)the Beijing Social Science Foundation(Grant No.21GLC042)the Humanity and Social Science Youth foundation of Ministry of Education,China(Grant No.20YJCZH228)。
文摘This paper examines the effect of the observation time on source identification of a discrete-time susceptible-infectedrecovered diffusion process in a network with snapshot of partial nodes.We formulate the source identification problem as a maximum likelihood(ML)estimator and develop a statistical inference method based on Monte Carlo simulation(MCS)to estimate the source location and the initial time of diffusion.Experimental results in synthetic networks and real-world networks demonstrate evident impact of the observation time as well as the fraction of the observers on the concerned problem.
文摘Sufficient conditions are given for the existence of unknown and partially known input normal and descriptor observers for a class of descriptor discrete time networked control systems.It is shown that a causal and regular descriptor system subjected to input and output periodic communication constraints,can be down sampled into a causal and regular p-lifted time invariant system.According to the lifted formulation,interesting results on minimum and maximum feasible values for communication sequence periods are drawn for the existence of an unknown or partially known input observer.The case of partially known input observer,cover unknown input case as an extreme case.An example is given for clarification.
基金This work was supported partly by Beili Huidong(Changshu)Vehicle Technology Company.
文摘Decision-making for autonomous vehicles in the presence of obstacle occlusions is difficult because the lack of accurate information affects the judgment.Existing methods may lead to overly conservative strategies and timeconsuming computations that cannot be balanced with efficiency.We propose to use distributional reinforcement learning to hedge the risk of strategies,optimize the worse cases,and improve the efficiency of the algorithm so that the agent learns better actions.A batch of smaller values is used to replace the average value to optimize the worse case,and combined with frame stacking,we call it Efficient-Fully parameterized Quantile Function(EFQF).This model is used to evaluate signal-free intersection crossing scenarios and makes more efficient moves and reduces the collision rate compared to conventional reinforcement learning algorithms in the presence of perceived occlusion.The model also has robustness in the case of data loss compared to the method with embedded long and short term memory.
基金supported by National Natural Science Foundation of China (Nos. 61174124, 61233003 and 60935001)National High Technology Research and Development Program of China (863 Program) (No. 2011AA01A102)
文摘In network service systems, satisfying quality of service (QoS) is one of the main objectives. Admission control and resource allocation strategy can be used to guarantee the QoS requirement. Based on partially observable Markov decision processes (POMDPs), this paper proposes a novel admission control model for video on demand (VOD) service systems with elastic QoS. Elastic QoS is also considered in resource allocation strategy. Policy gradient algorithm is often available to find the solution of POMDP problems, with a satisfactory convergence rate. Through numerical examples, it can be shown that the proposed admission control strategy has better performance than complete admission control strategy.
基金supported in part by the National Natural Sciences Foundation of China(Nos.61771158,61871147,61831008,91638204 and 61525103)the Shenzhen Basic Research Program(Nos.JCYJ20170811154309920,JCYJ20170811160142808,and ZDSYS201707280903305)Guangdong Science and Technology Planning Project(No.2018B030322004).
文摘The increasing demands in terms of high data rate and quality of services over the hybrid satellite-terrestrial relay networks(HSTRN)have pushed for the development of millimeter-wave(mmWave)band high-throughput satellites(HTS)with multibeams.The next generation of mmWave multibeam HTS communication systems(HTSCS)is viewed as the backbone network to enhance the throughput of the HSTRN.The article first investigates the basic backbone topology architecture of HTSCS,and an M-state Markov channel for the Ka/Q/V band mmWave systems is reviewed.Then,we propose a long-term optimal power allocation scheme over two in-dependent and identical spot beams based on the partially observable Markov decision process(POMDP),which can partly mitigate the negative effects of severe weather conditions.The key conditions for selecting the optimal power allocation action in the multibeam HTSCS are given.Simulation results show that our POMDP-based power allocation scheme can enhance the long-term throughput of the HTSCS.
基金supported by the National Natural Science Foundation of China(61701270)Young Doctor Cooperation Foundation of Qilu University of Technology(Shandong Academy of Sciences)(2017BSHZ008)。
文摘In order to solve the sensing and motion uncertainty problem of motion planning in narrow passage environment,a partition sampling strategy based on partially observable Markov decision process(POMDP)was proposed.The method combines partition sampling strategy and can improve the success rate of the robot motion planning in the narrow passage.Firstly,the environment is divided into open area and narrow area by using a partition sampling strategy,and generates the initial trajectory of the robot with fewer sampling points.Secondly,the method can calculate a local optimal solution of the initial nominal trajectory by solving POMDP problem,and iterates an overall optimal trajectory of robot motion.The proposed method follows the general POMDP solution framework,in which the belief dynamics is approximated by an extended Kalman filter(EKF),and the value function is represented by an effective quadratic function in the belief space near the nominal trajectory.Using a belief space variant of iterative linear quadratic Gaussian(iLQG)to perform the value iteration,which results in a linear control policy over the belief space that is locally optimal around the nominal trajectory.A new nominal trajectory is generated by executing the control strategy iteration,and the process is repeated until it converges to a locally optimal solution.Finally,the robot gets the optimal trajectory to safely pass through a narrow passage.The experimental results show that the proposed method can efficiently improves the performance of motion planning under uncertainty.
基金This work has been sponsored by EPSRC EP/D003/05/1“Amorphous Computing”and EPSRC EP/I009809/1“Evolutionary Approximation Algorithms for Optimization:Algorithm Design and Complexity Analysis”Grants.
文摘Purpose-The purpose of this paper is to establish a version of a theorem that originated from population genetics and has been later adopted in evolutionary computation theory that will lead to novel Monte-Carlo sampling algorithms that provably increase the AI potential.Design/methodology/approach-In the current paper the authors set up a mathematical framework,state and prove a version of a Geiringer-like theorem that is very well-suited for the development of Mote-Carlo sampling algorithms to cope with randomness and incomplete information to make decisions.Findings-This work establishes an important theoretical link between classical population genetics,evolutionary computation theory and model free reinforcement learning methodology.Not only may the theory explain the success of the currently existing Monte-Carlo tree sampling methodology,but it also leads to the development of novel Monte-Carlo sampling techniques guided by rigorous mathematical foundation.Practical implications-The theoretical foundations established in the current work provide guidance for the design of powerful Monte-Carlo sampling algorithms in model free reinforcement learning,to tackle numerous problems in computational intelligence.Originality/value-Establishing a Geiringer-like theorem with non-homologous recombination was a long-standing open problem in evolutionary computation theory.Apart from overcoming this challenge,in a mathematically elegant fashion and establishing a rather general and powerful version of the theorem,this work leads directly to the development of novel provably powerful algorithms for decision making in the environment involving randomness,hidden or incomplete information.
基金Corresponding au-thor:Ayman El Shenawy received the Ph.D.degree in systems and computer engineer-ing from Al-Azhar University,Egypt in 2013.He is currently working as a lecturer at Systems and Computers Engineering Department,Faculty of Engineering Al-Azhar University,Egypt.He already de-veloped some breakthrough research in the mentioned areas.He made significant con-tributions to the stated research fields.His research interests include artificial intelligent methods,robotics and machine learning.E-mail:eaymanelshenawy@azhar.edu.eg ORCID iD:0000-0002-1309-644。
文摘The multi-robot systems(MRS)exploration and fire searching problem is an important application of mobile robots which require massive computation capability that exceeds the ability of traditional MRS′s.This paper propose a cloud-based hybrid decentralized partially observable semi-Markov decision process(HDec-POSMDPs)model.The proposed model is implemented for MRS exploration and fire searching application based on the Internet of things(IoT)cloud robotics framework.In this implementation the heavy and expensive computational tasks are offloaded to the cloud servers.The proposed model achieves a significant improvement in the computation burden of the whole task relative to a traditional MRS.The proposed model is applied to explore and search for fire objects in an unknown environment;using different sets of robots sizes.The preliminary evaluation of this implementation demonstrates that as the parallelism of computational instances increase the delay of new actuation commands which will be decreased,the mean time of task completion is decreased,the number of turns in the path from the start pose cells to the target cells is minimized and the energy consumption for each robot is reduced.
文摘This paper examines how independent directors’social capital,as measured by their social network,affects corporate fraud.We find that firms with wellconnected independent directors are less likely to commit fraud,supporting our monitoring effect hypothesis.This result is robust to a battery of tests.Further analyses show that the effect is stronger for firms with a relatively poor legal environment,for firms whose independent directors face strong reputation incentives and when independent directors are audit committee members.Moreover,we explore a potential economic mechanism of the effect and observe that well-connected independent directors are associated with less absenteeism and more dissension.Overall,our findings suggest that independent directors’social capital plays an important role in corporate governance.