The transmission delay of realtime video packet mainly depends on the sensing time delay(short-term factor) and the entire frame transmission delay(long-term factor).Therefore,the optimization problem in the spectrum ...The transmission delay of realtime video packet mainly depends on the sensing time delay(short-term factor) and the entire frame transmission delay(long-term factor).Therefore,the optimization problem in the spectrum handoff process should be formulated as the combination of microscopic optimization and macroscopic optimization.In this paper,we focus on the issue of combining these two optimization models,and propose a novel Evolution Spectrum Handoff(ESH)strategy to minimize the expected transmission delay of real-time video packet.In the microoptimized model,considering the tradeoff between Primary User's(PU's) allowable collision percentage of each channel and transmission delay of video packet,we propose a mixed integer non-linear programming scheme.The scheme is able to achieve the minimum sensing time which is termed as an optimal stopping time.In the macro-optimized model,using the optimal stopping time as reward function within the partially observable Markov decision process framework,the EHS strategy is designed to search an optimal target channel set and minimize the expected delay of packet in the long-term real-time video transmission.Meanwhile,the minimum expected transmission delay is obtained under practical cognitive radio networks' conditions,i.e.,secondary user's mobility,PU's random access,imperfect sensing information,etc..Theoretical analysis and simulation results show that the ESH strategy can effectively reduce the transmission delay of video packet in spectrum handoff process.展开更多
A navigation method based on the partially observable markov decision process (POMDP) for smart wheelchairs in uncertain environments is presented in this paper. The design key factors for the navigation system of a...A navigation method based on the partially observable markov decision process (POMDP) for smart wheelchairs in uncertain environments is presented in this paper. The design key factors for the navigation system of a smart wheelchair are discussed. A kinematics model of the smart wheelchair is given, and the model and principle of POMDP are introduced. In order to respond in uncertain local environments, a novel navigation methodology based on POMDP using the sensors perception and the user's joystick input is presented. The state space, the action set, the observations and the sensor fusion of the navigation method are given in detail, and the optimal policy of the POMDP model is proposed. Experimental results demonstrate the feasibility of this navigation method. Analysis is also conducted to investigate performance evaluation, advantages of the approach and potential generalization of this paper.展开更多
I consider a system whose deterioration follows a discrete-time and discrete-state Markov chain with an absorbing state. When the system is put into practice, I may select operation (wait), imperfect repair, or replac...I consider a system whose deterioration follows a discrete-time and discrete-state Markov chain with an absorbing state. When the system is put into practice, I may select operation (wait), imperfect repair, or replacement at each discrete-time point. The true state of the system is not known when it is operated. Instead, the system is monitored after operation and some incomplete information concerned with the deterioration is obtained for decision making. Since there are multiple imperfect repairs, I can select one option from them when the imperfect repair is preferable to operation and replacement. To express this situation, I propose a POMDP model and theoretically investigate the structure of an optimal maintenance policy minimizing a total expected discounted cost for an unbounded horizon. Then two stochastic orders are used for the analysis of our problem.展开更多
Multiple earth observing satellites need to communicate with each other to observe plenty of targets on the Earth together. The factors, such as external interference, result in satellite information interaction delay...Multiple earth observing satellites need to communicate with each other to observe plenty of targets on the Earth together. The factors, such as external interference, result in satellite information interaction delays, which is unable to ensure the integrity and timeliness of the information on decision making for satellites. And the optimization of the planning result is affected. Therefore, the effect of communication delay is considered during the multi-satel ite coordinating process. For this problem, firstly, a distributed cooperative optimization problem for multiple satellites in the delayed communication environment is formulized. Secondly, based on both the analysis of the temporal sequence of tasks in a single satellite and the dynamically decoupled characteristics of the multi-satellite system, the environment information of multi-satellite distributed cooperative optimization is constructed on the basis of the directed acyclic graph(DAG). Then, both a cooperative optimization decision making framework and a model are built according to the decentralized partial observable Markov decision process(DEC-POMDP). After that, a satellite coordinating strategy aimed at different conditions of communication delay is mainly analyzed, and a unified processing strategy on communication delay is designed. An approximate cooperative optimization algorithm based on simulated annealing is proposed. Finally, the effectiveness and robustness of the method presented in this paper are verified via the simulation.展开更多
Purpose-The purpose of this paper is to establish a version of a theorem that originated from population genetics and has been later adopted in evolutionary computation theory that will lead to novel Monte-Carlo sampl...Purpose-The purpose of this paper is to establish a version of a theorem that originated from population genetics and has been later adopted in evolutionary computation theory that will lead to novel Monte-Carlo sampling algorithms that provably increase the AI potential.Design/methodology/approach-In the current paper the authors set up a mathematical framework,state and prove a version of a Geiringer-like theorem that is very well-suited for the development of Mote-Carlo sampling algorithms to cope with randomness and incomplete information to make decisions.Findings-This work establishes an important theoretical link between classical population genetics,evolutionary computation theory and model free reinforcement learning methodology.Not only may the theory explain the success of the currently existing Monte-Carlo tree sampling methodology,but it also leads to the development of novel Monte-Carlo sampling techniques guided by rigorous mathematical foundation.Practical implications-The theoretical foundations established in the current work provide guidance for the design of powerful Monte-Carlo sampling algorithms in model free reinforcement learning,to tackle numerous problems in computational intelligence.Originality/value-Establishing a Geiringer-like theorem with non-homologous recombination was a long-standing open problem in evolutionary computation theory.Apart from overcoming this challenge,in a mathematically elegant fashion and establishing a rather general and powerful version of the theorem,this work leads directly to the development of novel provably powerful algorithms for decision making in the environment involving randomness,hidden or incomplete information.展开更多
Decision-making for autonomous vehicles in the presence of obstacle occlusions is difficult because the lack of accurate information affects the judgment.Existing methods may lead to overly conservative strategies and...Decision-making for autonomous vehicles in the presence of obstacle occlusions is difficult because the lack of accurate information affects the judgment.Existing methods may lead to overly conservative strategies and timeconsuming computations that cannot be balanced with efficiency.We propose to use distributional reinforcement learning to hedge the risk of strategies,optimize the worse cases,and improve the efficiency of the algorithm so that the agent learns better actions.A batch of smaller values is used to replace the average value to optimize the worse case,and combined with frame stacking,we call it Efficient-Fully parameterized Quantile Function(EFQF).This model is used to evaluate signal-free intersection crossing scenarios and makes more efficient moves and reduces the collision rate compared to conventional reinforcement learning algorithms in the presence of perceived occlusion.The model also has robustness in the case of data loss compared to the method with embedded long and short term memory.展开更多
We explore the use of caching both at the network edge and within User Equipment(UE)to alleviate traffic load of wireless networks.We develop a joint cache placement and delivery policy that maximizes the Quality of S...We explore the use of caching both at the network edge and within User Equipment(UE)to alleviate traffic load of wireless networks.We develop a joint cache placement and delivery policy that maximizes the Quality of Service(QoS)while simultaneously minimizing backhaul load and UE power consumption,in the presence of an unknown time-variant file popularity.With file requests in a time slot being affected by download success in the previous slot,the caching system becomes a non-stationary Partial Observable Markov Decision Process(POMDP).We solve the problem in a deep reinforcement learning framework based on the Advantageous Actor-Critic(A2C)algorithm,comparing Feed Forward Neural Networks(FFNN)with a Long Short-Term Memory(LSTM)approach specifically designed to exploit the correlation of file popularity distribution across time slots.Simulation results show that using LSTM-based A2C outperforms FFNN-based A2C in terms of sample efficiency and optimality,demonstrating superior performance for the non-stationary POMDP problem.For caching at the UEs,we provide a distributed algorithm that reaches the objectives dictated by the agent controlling the network,with minimum energy consumption at the UEs,and minimum communication overhead.展开更多
In network service systems, satisfying quality of service (QoS) is one of the main objectives. Admission control and resource allocation strategy can be used to guarantee the QoS requirement. Based on partially observ...In network service systems, satisfying quality of service (QoS) is one of the main objectives. Admission control and resource allocation strategy can be used to guarantee the QoS requirement. Based on partially observable Markov decision processes (POMDPs), this paper proposes a novel admission control model for video on demand (VOD) service systems with elastic QoS. Elastic QoS is also considered in resource allocation strategy. Policy gradient algorithm is often available to find the solution of POMDP problems, with a satisfactory convergence rate. Through numerical examples, it can be shown that the proposed admission control strategy has better performance than complete admission control strategy.展开更多
The increasing demands in terms of high data rate and quality of services over the hybrid satellite-terrestrial relay networks(HSTRN)have pushed for the development of millimeter-wave(mmWave)band high-throughput satel...The increasing demands in terms of high data rate and quality of services over the hybrid satellite-terrestrial relay networks(HSTRN)have pushed for the development of millimeter-wave(mmWave)band high-throughput satellites(HTS)with multibeams.The next generation of mmWave multibeam HTS communication systems(HTSCS)is viewed as the backbone network to enhance the throughput of the HSTRN.The article first investigates the basic backbone topology architecture of HTSCS,and an M-state Markov channel for the Ka/Q/V band mmWave systems is reviewed.Then,we propose a long-term optimal power allocation scheme over two in-dependent and identical spot beams based on the partially observable Markov decision process(POMDP),which can partly mitigate the negative effects of severe weather conditions.The key conditions for selecting the optimal power allocation action in the multibeam HTSCS are given.Simulation results show that our POMDP-based power allocation scheme can enhance the long-term throughput of the HTSCS.展开更多
In order to solve the sensing and motion uncertainty problem of motion planning in narrow passage environment,a partition sampling strategy based on partially observable Markov decision process(POMDP)was proposed.The ...In order to solve the sensing and motion uncertainty problem of motion planning in narrow passage environment,a partition sampling strategy based on partially observable Markov decision process(POMDP)was proposed.The method combines partition sampling strategy and can improve the success rate of the robot motion planning in the narrow passage.Firstly,the environment is divided into open area and narrow area by using a partition sampling strategy,and generates the initial trajectory of the robot with fewer sampling points.Secondly,the method can calculate a local optimal solution of the initial nominal trajectory by solving POMDP problem,and iterates an overall optimal trajectory of robot motion.The proposed method follows the general POMDP solution framework,in which the belief dynamics is approximated by an extended Kalman filter(EKF),and the value function is represented by an effective quadratic function in the belief space near the nominal trajectory.Using a belief space variant of iterative linear quadratic Gaussian(iLQG)to perform the value iteration,which results in a linear control policy over the belief space that is locally optimal around the nominal trajectory.A new nominal trajectory is generated by executing the control strategy iteration,and the process is repeated until it converges to a locally optimal solution.Finally,the robot gets the optimal trajectory to safely pass through a narrow passage.The experimental results show that the proposed method can efficiently improves the performance of motion planning under uncertainty.展开更多
Bum-in has been proven effective in identifying and removing defective products before they are delivered to customers.Most existing bum-in models adopt a one-shot scheme,which may not be sufficient enough for identif...Bum-in has been proven effective in identifying and removing defective products before they are delivered to customers.Most existing bum-in models adopt a one-shot scheme,which may not be sufficient enough for identification.Borrowing the idea from sequential inspections for remaining useful life prediction and accelerated lifetime test,this study proposes a sequential degradation-based bum-in model with multiple periodic inspections.At each inspection epoch,the posterior probability that a product belongs to a normal one is updated with the inspected degradation level.Based on the degradation level and the updated posterior probability,a product can be disposed,put into field use,or kept in the test till the next inspection epoch.We cast the problem into a partially observed Markov decision process to minimize the expected total bum-in cost of a product,and derive some interesting structures of the optimal policy.Then,algorithms are provided to find the joint optimal inspection period and number of inspections in steps.A numerical study is also provided to illustrate the effectiveness of our proposed model.展开更多
基金supported by the National Natural Science Foundation of China under Grant No.61301101
文摘The transmission delay of realtime video packet mainly depends on the sensing time delay(short-term factor) and the entire frame transmission delay(long-term factor).Therefore,the optimization problem in the spectrum handoff process should be formulated as the combination of microscopic optimization and macroscopic optimization.In this paper,we focus on the issue of combining these two optimization models,and propose a novel Evolution Spectrum Handoff(ESH)strategy to minimize the expected transmission delay of real-time video packet.In the microoptimized model,considering the tradeoff between Primary User's(PU's) allowable collision percentage of each channel and transmission delay of video packet,we propose a mixed integer non-linear programming scheme.The scheme is able to achieve the minimum sensing time which is termed as an optimal stopping time.In the macro-optimized model,using the optimal stopping time as reward function within the partially observable Markov decision process framework,the EHS strategy is designed to search an optimal target channel set and minimize the expected delay of packet in the long-term real-time video transmission.Meanwhile,the minimum expected transmission delay is obtained under practical cognitive radio networks' conditions,i.e.,secondary user's mobility,PU's random access,imperfect sensing information,etc..Theoretical analysis and simulation results show that the ESH strategy can effectively reduce the transmission delay of video packet in spectrum handoff process.
文摘A navigation method based on the partially observable markov decision process (POMDP) for smart wheelchairs in uncertain environments is presented in this paper. The design key factors for the navigation system of a smart wheelchair are discussed. A kinematics model of the smart wheelchair is given, and the model and principle of POMDP are introduced. In order to respond in uncertain local environments, a novel navigation methodology based on POMDP using the sensors perception and the user's joystick input is presented. The state space, the action set, the observations and the sensor fusion of the navigation method are given in detail, and the optimal policy of the POMDP model is proposed. Experimental results demonstrate the feasibility of this navigation method. Analysis is also conducted to investigate performance evaluation, advantages of the approach and potential generalization of this paper.
文摘I consider a system whose deterioration follows a discrete-time and discrete-state Markov chain with an absorbing state. When the system is put into practice, I may select operation (wait), imperfect repair, or replacement at each discrete-time point. The true state of the system is not known when it is operated. Instead, the system is monitored after operation and some incomplete information concerned with the deterioration is obtained for decision making. Since there are multiple imperfect repairs, I can select one option from them when the imperfect repair is preferable to operation and replacement. To express this situation, I propose a POMDP model and theoretically investigate the structure of an optimal maintenance policy minimizing a total expected discounted cost for an unbounded horizon. Then two stochastic orders are used for the analysis of our problem.
基金supported by the National Science Foundation for Young Scholars of China(6130123471401175)
文摘Multiple earth observing satellites need to communicate with each other to observe plenty of targets on the Earth together. The factors, such as external interference, result in satellite information interaction delays, which is unable to ensure the integrity and timeliness of the information on decision making for satellites. And the optimization of the planning result is affected. Therefore, the effect of communication delay is considered during the multi-satel ite coordinating process. For this problem, firstly, a distributed cooperative optimization problem for multiple satellites in the delayed communication environment is formulized. Secondly, based on both the analysis of the temporal sequence of tasks in a single satellite and the dynamically decoupled characteristics of the multi-satellite system, the environment information of multi-satellite distributed cooperative optimization is constructed on the basis of the directed acyclic graph(DAG). Then, both a cooperative optimization decision making framework and a model are built according to the decentralized partial observable Markov decision process(DEC-POMDP). After that, a satellite coordinating strategy aimed at different conditions of communication delay is mainly analyzed, and a unified processing strategy on communication delay is designed. An approximate cooperative optimization algorithm based on simulated annealing is proposed. Finally, the effectiveness and robustness of the method presented in this paper are verified via the simulation.
基金This work has been sponsored by EPSRC EP/D003/05/1“Amorphous Computing”and EPSRC EP/I009809/1“Evolutionary Approximation Algorithms for Optimization:Algorithm Design and Complexity Analysis”Grants.
文摘Purpose-The purpose of this paper is to establish a version of a theorem that originated from population genetics and has been later adopted in evolutionary computation theory that will lead to novel Monte-Carlo sampling algorithms that provably increase the AI potential.Design/methodology/approach-In the current paper the authors set up a mathematical framework,state and prove a version of a Geiringer-like theorem that is very well-suited for the development of Mote-Carlo sampling algorithms to cope with randomness and incomplete information to make decisions.Findings-This work establishes an important theoretical link between classical population genetics,evolutionary computation theory and model free reinforcement learning methodology.Not only may the theory explain the success of the currently existing Monte-Carlo tree sampling methodology,but it also leads to the development of novel Monte-Carlo sampling techniques guided by rigorous mathematical foundation.Practical implications-The theoretical foundations established in the current work provide guidance for the design of powerful Monte-Carlo sampling algorithms in model free reinforcement learning,to tackle numerous problems in computational intelligence.Originality/value-Establishing a Geiringer-like theorem with non-homologous recombination was a long-standing open problem in evolutionary computation theory.Apart from overcoming this challenge,in a mathematically elegant fashion and establishing a rather general and powerful version of the theorem,this work leads directly to the development of novel provably powerful algorithms for decision making in the environment involving randomness,hidden or incomplete information.
基金This work was supported partly by Beili Huidong(Changshu)Vehicle Technology Company.
文摘Decision-making for autonomous vehicles in the presence of obstacle occlusions is difficult because the lack of accurate information affects the judgment.Existing methods may lead to overly conservative strategies and timeconsuming computations that cannot be balanced with efficiency.We propose to use distributional reinforcement learning to hedge the risk of strategies,optimize the worse cases,and improve the efficiency of the algorithm so that the agent learns better actions.A batch of smaller values is used to replace the average value to optimize the worse case,and combined with frame stacking,we call it Efficient-Fully parameterized Quantile Function(EFQF).This model is used to evaluate signal-free intersection crossing scenarios and makes more efficient moves and reduces the collision rate compared to conventional reinforcement learning algorithms in the presence of perceived occlusion.The model also has robustness in the case of data loss compared to the method with embedded long and short term memory.
文摘We explore the use of caching both at the network edge and within User Equipment(UE)to alleviate traffic load of wireless networks.We develop a joint cache placement and delivery policy that maximizes the Quality of Service(QoS)while simultaneously minimizing backhaul load and UE power consumption,in the presence of an unknown time-variant file popularity.With file requests in a time slot being affected by download success in the previous slot,the caching system becomes a non-stationary Partial Observable Markov Decision Process(POMDP).We solve the problem in a deep reinforcement learning framework based on the Advantageous Actor-Critic(A2C)algorithm,comparing Feed Forward Neural Networks(FFNN)with a Long Short-Term Memory(LSTM)approach specifically designed to exploit the correlation of file popularity distribution across time slots.Simulation results show that using LSTM-based A2C outperforms FFNN-based A2C in terms of sample efficiency and optimality,demonstrating superior performance for the non-stationary POMDP problem.For caching at the UEs,we provide a distributed algorithm that reaches the objectives dictated by the agent controlling the network,with minimum energy consumption at the UEs,and minimum communication overhead.
基金supported by National Natural Science Foundation of China (Nos. 61174124, 61233003 and 60935001)National High Technology Research and Development Program of China (863 Program) (No. 2011AA01A102)
文摘In network service systems, satisfying quality of service (QoS) is one of the main objectives. Admission control and resource allocation strategy can be used to guarantee the QoS requirement. Based on partially observable Markov decision processes (POMDPs), this paper proposes a novel admission control model for video on demand (VOD) service systems with elastic QoS. Elastic QoS is also considered in resource allocation strategy. Policy gradient algorithm is often available to find the solution of POMDP problems, with a satisfactory convergence rate. Through numerical examples, it can be shown that the proposed admission control strategy has better performance than complete admission control strategy.
基金supported in part by the National Natural Sciences Foundation of China(Nos.61771158,61871147,61831008,91638204 and 61525103)the Shenzhen Basic Research Program(Nos.JCYJ20170811154309920,JCYJ20170811160142808,and ZDSYS201707280903305)Guangdong Science and Technology Planning Project(No.2018B030322004).
文摘The increasing demands in terms of high data rate and quality of services over the hybrid satellite-terrestrial relay networks(HSTRN)have pushed for the development of millimeter-wave(mmWave)band high-throughput satellites(HTS)with multibeams.The next generation of mmWave multibeam HTS communication systems(HTSCS)is viewed as the backbone network to enhance the throughput of the HSTRN.The article first investigates the basic backbone topology architecture of HTSCS,and an M-state Markov channel for the Ka/Q/V band mmWave systems is reviewed.Then,we propose a long-term optimal power allocation scheme over two in-dependent and identical spot beams based on the partially observable Markov decision process(POMDP),which can partly mitigate the negative effects of severe weather conditions.The key conditions for selecting the optimal power allocation action in the multibeam HTSCS are given.Simulation results show that our POMDP-based power allocation scheme can enhance the long-term throughput of the HTSCS.
基金supported by the National Natural Science Foundation of China(61701270)Young Doctor Cooperation Foundation of Qilu University of Technology(Shandong Academy of Sciences)(2017BSHZ008)。
文摘In order to solve the sensing and motion uncertainty problem of motion planning in narrow passage environment,a partition sampling strategy based on partially observable Markov decision process(POMDP)was proposed.The method combines partition sampling strategy and can improve the success rate of the robot motion planning in the narrow passage.Firstly,the environment is divided into open area and narrow area by using a partition sampling strategy,and generates the initial trajectory of the robot with fewer sampling points.Secondly,the method can calculate a local optimal solution of the initial nominal trajectory by solving POMDP problem,and iterates an overall optimal trajectory of robot motion.The proposed method follows the general POMDP solution framework,in which the belief dynamics is approximated by an extended Kalman filter(EKF),and the value function is represented by an effective quadratic function in the belief space near the nominal trajectory.Using a belief space variant of iterative linear quadratic Gaussian(iLQG)to perform the value iteration,which results in a linear control policy over the belief space that is locally optimal around the nominal trajectory.A new nominal trajectory is generated by executing the control strategy iteration,and the process is repeated until it converges to a locally optimal solution.Finally,the robot gets the optimal trajectory to safely pass through a narrow passage.The experimental results show that the proposed method can efficiently improves the performance of motion planning under uncertainty.
基金The research is supported by the National Natural Science Foundation of China(Grant Nos.7180116&72071138 and 72071071)the Young Talent Support Plan of Hebei Province.
文摘Bum-in has been proven effective in identifying and removing defective products before they are delivered to customers.Most existing bum-in models adopt a one-shot scheme,which may not be sufficient enough for identification.Borrowing the idea from sequential inspections for remaining useful life prediction and accelerated lifetime test,this study proposes a sequential degradation-based bum-in model with multiple periodic inspections.At each inspection epoch,the posterior probability that a product belongs to a normal one is updated with the inspected degradation level.Based on the degradation level and the updated posterior probability,a product can be disposed,put into field use,or kept in the test till the next inspection epoch.We cast the problem into a partially observed Markov decision process to minimize the expected total bum-in cost of a product,and derive some interesting structures of the optimal policy.Then,algorithms are provided to find the joint optimal inspection period and number of inspections in steps.A numerical study is also provided to illustrate the effectiveness of our proposed model.