期刊文献+
共找到32篇文章
< 1 2 >
每页显示 20 50 100
A Mean-Field Game for a Forward-Backward Stochastic System With Partial Observation and Common Noise
1
作者 Pengyan Huang Guangchen Wang +1 位作者 Shujun Wang Hua Xiao 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第3期746-759,共14页
This paper considers a linear-quadratic(LQ) meanfield game governed by a forward-backward stochastic system with partial observation and common noise,where a coupling structure enters state equations,cost functionals ... This paper considers a linear-quadratic(LQ) meanfield game governed by a forward-backward stochastic system with partial observation and common noise,where a coupling structure enters state equations,cost functionals and observation equations.Firstly,to reduce the complexity of solving the meanfield game,a limiting control problem is introduced.By virtue of the decomposition approach,an admissible control set is proposed.Applying a filter technique and dimensional-expansion technique,a decentralized control strategy and a consistency condition system are derived,and the related solvability is also addressed.Secondly,we discuss an approximate Nash equilibrium property of the decentralized control strategy.Finally,we work out a financial problem with some numerical simulations. 展开更多
关键词 Decentralized control strategy ϵ-Nash equilibrium forward-backward stochastic system mean-field game partial observation
下载PDF
Multi-object tracking based on behaviour and partial observation
2
作者 路红 费树岷 +1 位作者 郑建勇 张涛 《Journal of Southeast University(English Edition)》 EI CAS 2008年第4期468-472,共5页
To cope with multi-object tracking under real-world complex situations, a new video-based method is proposed. In the detecting step, the moving objects are segmented with the third level DWT (discrete wavelet transfo... To cope with multi-object tracking under real-world complex situations, a new video-based method is proposed. In the detecting step, the moving objects are segmented with the third level DWT (discrete wavelet transform )and background difference. In the tracking step, the Kalman filter and scale parameter are used first to estimate the object position and bounding box. Then, the center-association-based projection ratio and region-association-based occlusion ratio are defined and combined to judge object behaviours. Finally, the tracking scheme and Kalman parameters are adaptively adjusted according to object behaviour. Under occlusion, partial observability is utilized to obtain the object measurements and optimum box dimensions. This method is robust in tracking mobile objects under such situations as occlusion, new appearing and stablization, etc. Experimental results show that the proposed method is efficient. 展开更多
关键词 multi-object tracking projection ratio occlusion ratio BEHAVIOUR partial observation Kalman filter
下载PDF
Data-Driven Modeling of Partially Observed Biological Systems
3
作者 Wei-Hung Su Ching-Shan Chou Dongbin Xiu 《Communications on Applied Mathematics and Computation》 EI 2024年第1期739-754,共16页
We present a numerical approach for modeling unknown dynamical systems using partially observed data,with a focus on biological systems with(relatively)complex dynamical behavior.As an extension of the recently develo... We present a numerical approach for modeling unknown dynamical systems using partially observed data,with a focus on biological systems with(relatively)complex dynamical behavior.As an extension of the recently developed deep neural network(DNN)learning methods,our approach is particularly suitable for practical situations when(i)measurement data are available for only a subset of the state variables,and(ii)the system parameters cannot be observed or measured at all.We demonstrate that,with a properly designed DNN structure with memory terms,effective DNN models can be learned from such partially observed data containing hidden parameters.The learned DNN model serves as an accurate predictive tool for system analysis.Through a few representative biological problems,we demonstrate that such DNN models can capture qualitative dynamical behavior changes in the system,such as bifurcations,even when the parameters controlling such behavior changes are completely unknown throughout not only the model learning process but also the system prediction process.The learned DNN model effectively creates a“closed”model involving only the observables when such a closed-form model does not exist mathematically. 展开更多
关键词 Deep neural network(DNN) Governing equation discovery Biological system partial observation
下载PDF
Insider Trading with a Random Deadline under Partial Observations:Maximal Principle Method 被引量:1
4
作者 Kai XIAO Yong-hui ZHOU 《Acta Mathematicae Applicatae Sinica》 SCIE CSCD 2022年第4期753-762,共10页
For a revised model of Caldentey and Stacchetti(Econometrica,2010)in continuous-time insider trading with a random deadline which allows market makers to observe some information on a risky asset,a closed form of its ... For a revised model of Caldentey and Stacchetti(Econometrica,2010)in continuous-time insider trading with a random deadline which allows market makers to observe some information on a risky asset,a closed form of its market equilibrium consisting of optimal insider trading intensity and market liquidity is obtained by maximum principle method.It shows that in the equilibrium,(i)as time goes by,the optimal insider trading intensity is exponentially increasing even up to infinity while both the market liquidity and the residual information are exponentially decreasing even down to zero;(ii)the more accurate information observed by market makers,the stronger optimal insider trading intensity is such that the total expect profit of the insider is decreasing even go to zero while both the market liquidity and the residual information are decreasing;(iii)the longer the mean of random time,the weaker the optimal insider trading intensity is while the more both the residual information and the expected profit are,but there is a threshold of trading time,half of the mean of the random time,such that if and only if after it the market liquidity is increasing with the mean of random time increasing. 展开更多
关键词 continuous-time insider trading random deadline partial observations filtering theory maximal principle
原文传递
Effect of observation time on source identification of diffusion in complex networks 被引量:1
5
作者 Chaoyi Shi Qi Zhang Tianguang Chu 《Chinese Physics B》 SCIE EI CAS CSCD 2022年第7期97-103,共7页
This paper examines the effect of the observation time on source identification of a discrete-time susceptible-infectedrecovered diffusion process in a network with snapshot of partial nodes.We formulate the source id... This paper examines the effect of the observation time on source identification of a discrete-time susceptible-infectedrecovered diffusion process in a network with snapshot of partial nodes.We formulate the source identification problem as a maximum likelihood(ML)estimator and develop a statistical inference method based on Monte Carlo simulation(MCS)to estimate the source location and the initial time of diffusion.Experimental results in synthetic networks and real-world networks demonstrate evident impact of the observation time as well as the fraction of the observers on the concerned problem. 展开更多
关键词 complex network source identification statistical inference partial observation
下载PDF
On stochastic optimal control of partially observable nonlinear quasi Hamiltonian systems 被引量:10
6
作者 朱位秋 应祖光 《Journal of Zhejiang University Science》 EI CSCD 2004年第11期1313-1317,共5页
A stochastic optimal control strategy for partially observable nonlinear quasi Hamiltonian systems is proposed. The optimal control forces consist of two parts. The first part is determined by the conditions under whi... A stochastic optimal control strategy for partially observable nonlinear quasi Hamiltonian systems is proposed. The optimal control forces consist of two parts. The first part is determined by the conditions under which the stochastic optimal control problem of a partially observable nonlinear system is converted into that of a completely observable linear system. The second part is determined by solving the dynamical programming equation derived by applying the stochastic averaging method and stochastic dynamical programming principle to the completely observable linear control system. The response of the optimally controlled quasi Hamiltonian system is predicted by solving the averaged Fokker-Planck-Kolmogorov equation associated with the optimally controlled completely observable linear system and solving the Riccati equation for the estimated error of system states. An example is given to illustrate the procedure and effectiveness of the proposed control strategy. 展开更多
关键词 Nonlinear system partially observation Stochastic optimal control Separation principle Stochastic averaging Dynamical programming
下载PDF
Improved design of online fault diagnoser for partially observed Petri nets with generalized mutual exclusion constraints 被引量:2
7
作者 Jiufu Liu Wenliang Liu +2 位作者 Jianyong Zhou Yan Sun Zhisheng Wang 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2017年第5期971-978,共8页
This paper investigates the fault detection problem for discrete event systems (DESs) which can be modeled by partially observed Petri nets (POPNs). To overcome the problem of low diagnosability in the POPN online fau... This paper investigates the fault detection problem for discrete event systems (DESs) which can be modeled by partially observed Petri nets (POPNs). To overcome the problem of low diagnosability in the POPN online fault diagnoser in current use, an improved online fault diagnosis algorithm that integrates generalized mutual exclusion constraints (GMECs) and integer linear programming (ILP) is proposed. Assume that the POPN structure and its initial markings are known, and the faults are modeled as unobservable transitions. First, the event sequence is observed and recorded. GMEC is used for elementary diagnosis of the system behavior, then the ILP problem of POPN is solved for further diagnosis. Finally, an example of a real DES to test the new fault diagnoser is analyzed. The proposed algorithm increases the diagnosability of the DES remarkably, and the effectiveness of the new algorithm integrating GMEC and ILP is verified. 展开更多
关键词 fault diagnosis partially observed Petri nets (POPNs) integer linear programming (ILP) generalized mutual exclusion constraints (GMECs)
下载PDF
STOCHASTIC OPTIMAL VIBRATION CONTROL OF PARTIALLY OBSERVABLE NONLINEAR QUASI HAMILTONIAN SYSTEMS WITH ACTUATOR SATURATION 被引量:1
8
作者 Ronghua Huan Lincong Chen +1 位作者 Weiliang Jin Weiqiu Zhu 《Acta Mechanica Solida Sinica》 SCIE EI 2009年第2期143-151,共9页
An optimal vibration control strategy for partially observable nonlinear quasi Hamiltonian systems with actuator saturation is proposed. First,a controlled partially observable non-linear system is converted into a co... An optimal vibration control strategy for partially observable nonlinear quasi Hamiltonian systems with actuator saturation is proposed. First,a controlled partially observable non-linear system is converted into a completely observable linear control system of finite dimension based on the theorem due to Charalambous and Elliott. Then the partially averaged It stochastic differential equations and dynamical programming equation associated with the completely observable linear system are derived by using the stochastic averaging method and stochastic dynamical programming principle,respectively. The optimal control law is obtained from solving the final dynamical programming equation. The results show that the proposed control strategy has high control effectiveness and control effciency. 展开更多
关键词 nonlinear system random excitations optimal control partially observation actuator saturation
下载PDF
Impulse Control Problem of Partially Observed Diffusion Processes
9
作者 Baghery-Kabbaj Fouzia Massa-Turpin Isabelle 《Computer Technology and Application》 2011年第1期80-84,共5页
The authors investigate the problem of impulse control of a partially observed diffusion process. The authors study the impulse control of Zakai type equations. The associated value function is characterized as the on... The authors investigate the problem of impulse control of a partially observed diffusion process. The authors study the impulse control of Zakai type equations. The associated value function is characterized as the only viscosity solution of the corresponding quasi-variational inequality. The authors show the optimal cost function for the problem with incomplete information can be approximated by a sequence of value functions of the previous type. 展开更多
关键词 Impulse control partially observed diffusion process nonlinear filtering diffusion process Hamilton Jacobi Bellmanquasi-variational inequality viscosity solutions.
下载PDF
Analysis of a POMDP Model for an Optimal Maintenance Problem with Multiple Imperfect Repairs
10
作者 Nobuyuki Tamura 《American Journal of Operations Research》 2023年第6期133-146,共14页
I consider a system whose deterioration follows a discrete-time and discrete-state Markov chain with an absorbing state. When the system is put into practice, I may select operation (wait), imperfect repair, or replac... I consider a system whose deterioration follows a discrete-time and discrete-state Markov chain with an absorbing state. When the system is put into practice, I may select operation (wait), imperfect repair, or replacement at each discrete-time point. The true state of the system is not known when it is operated. Instead, the system is monitored after operation and some incomplete information concerned with the deterioration is obtained for decision making. Since there are multiple imperfect repairs, I can select one option from them when the imperfect repair is preferable to operation and replacement. To express this situation, I propose a POMDP model and theoretically investigate the structure of an optimal maintenance policy minimizing a total expected discounted cost for an unbounded horizon. Then two stochastic orders are used for the analysis of our problem. 展开更多
关键词 partially Observable Markov Decision Process Imperfect Repair Stochastic Order Monotone Property Optimal Maintenance Policy
下载PDF
Data-driven Inverter-based Volt/VAr Control for Partially Observable Distribution Networks 被引量:3
11
作者 Tong Xu Wenchuan Wu +2 位作者 Yiwen Hong Junjie Yu Fazhong Zhang 《CSEE Journal of Power and Energy Systems》 SCIE EI CSCD 2023年第2期548-560,共13页
For active distribution networks(ADNs)integrated with massive inverter-based energy resources,it is impractical to maintain the accurate model and deploy measurements at all nodes due to the large-scale of ADNs.Thus,c... For active distribution networks(ADNs)integrated with massive inverter-based energy resources,it is impractical to maintain the accurate model and deploy measurements at all nodes due to the large-scale of ADNs.Thus,current models of ADNs usually involve significant errors or even unknown occurances.Moreover,ADNs are usually partially observable since only a few measurements are available at pilot nodes or nodes with significant users.To provide a practical Volt/Var control(VVC)strategy for such networks,a data-driven VVC method is proposed in this paper.First,the system response policy,approximating the relationship between the control variables and states of monitoring nodes,is estimated by a recursive regression closed-form solution.Then,based on real-time measurements and the newly updated system response policy,a VVC strategy with convergence guarantee is realized.Since the recursive regression solution is embedded in the control stage,a data-driven closedloop VVC framework is established.The effectiveness of the proposed method is validated in an unbalanced distribution system considering nonlinear loads,where not only the rapid and self-adaptive voltage regulation is realized,but also systemwide optimization is achieved. 展开更多
关键词 DATA-DRIVEN distribution networks partial observation Volt/VAr control
原文传递
Optimization of dynamic sequential test strategy for equipment health management 被引量:3
12
作者 Shuming Yang Jing Qiu Guanjun Liu Peng Yang 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2012年第1期71-77,共7页
Testing is the premise and foundation of realizing equipment health management (EHM). To address the problem that the static periodic test strategy may cause deficient test or excessive test, a dynamic sequential te... Testing is the premise and foundation of realizing equipment health management (EHM). To address the problem that the static periodic test strategy may cause deficient test or excessive test, a dynamic sequential test strategy (DSTS) for EHM is presented. Considering the situation that equipment health state is not completely observable in reality, a DSTS optimization method based on partially observable semi-Markov decision pro- cess (POSMDP) is proposed. Firstly, an equipment health state degradation model is constructed by Markov process, and the control limit maintenance policy is also introduced. Secondly, POSMDP is formulated in great detail. And then, POSMDP is converted to completely observable belief semi-Markov decision process (BSMDP) through belief state. The optimal equation and the corresponding optimal DSTS, which minimize the long-run ex- pected average cost per unit time, are obtained with BSMDP. The results of application in complex equipment show that the proposed DSTS is feasible and effective. 展开更多
关键词 equipment health management (EHM) dynamic sequential test strategy (DSTS) partially observable semi-Markov decision process (POSMDP) optimal equation.
下载PDF
Distributed cooperative task planning algorithm for multiple satellites in delayed communication environment 被引量:2
13
作者 Chong Wang Jinhui Tang +2 位作者 Xiaohang Cheng Yingchen Liu Changchun Wang 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2016年第3期619-633,共15页
Multiple earth observing satellites need to communicate with each other to observe plenty of targets on the Earth together. The factors, such as external interference, result in satellite information interaction delay... Multiple earth observing satellites need to communicate with each other to observe plenty of targets on the Earth together. The factors, such as external interference, result in satellite information interaction delays, which is unable to ensure the integrity and timeliness of the information on decision making for satellites. And the optimization of the planning result is affected. Therefore, the effect of communication delay is considered during the multi-satel ite coordinating process. For this problem, firstly, a distributed cooperative optimization problem for multiple satellites in the delayed communication environment is formulized. Secondly, based on both the analysis of the temporal sequence of tasks in a single satellite and the dynamically decoupled characteristics of the multi-satellite system, the environment information of multi-satellite distributed cooperative optimization is constructed on the basis of the directed acyclic graph(DAG). Then, both a cooperative optimization decision making framework and a model are built according to the decentralized partial observable Markov decision process(DEC-POMDP). After that, a satellite coordinating strategy aimed at different conditions of communication delay is mainly analyzed, and a unified processing strategy on communication delay is designed. An approximate cooperative optimization algorithm based on simulated annealing is proposed. Finally, the effectiveness and robustness of the method presented in this paper are verified via the simulation. 展开更多
关键词 Earth observing satellite(EOS) distributed coo-perative task planning delayed communication decentralized partial observable Markov decision process(DEC-POMDP) simulated annealing
下载PDF
Evolution Handoff Strategy for Real-Time Video Transmission over Practical Cognitive Radio Networks 被引量:1
14
作者 LIU Fa MA Yongkui +1 位作者 ZHAO Honglin DING Kai 《China Communications》 SCIE CSCD 2015年第2期141-154,共14页
The transmission delay of realtime video packet mainly depends on the sensing time delay(short-term factor) and the entire frame transmission delay(long-term factor).Therefore,the optimization problem in the spectrum ... The transmission delay of realtime video packet mainly depends on the sensing time delay(short-term factor) and the entire frame transmission delay(long-term factor).Therefore,the optimization problem in the spectrum handoff process should be formulated as the combination of microscopic optimization and macroscopic optimization.In this paper,we focus on the issue of combining these two optimization models,and propose a novel Evolution Spectrum Handoff(ESH)strategy to minimize the expected transmission delay of real-time video packet.In the microoptimized model,considering the tradeoff between Primary User's(PU's) allowable collision percentage of each channel and transmission delay of video packet,we propose a mixed integer non-linear programming scheme.The scheme is able to achieve the minimum sensing time which is termed as an optimal stopping time.In the macro-optimized model,using the optimal stopping time as reward function within the partially observable Markov decision process framework,the EHS strategy is designed to search an optimal target channel set and minimize the expected delay of packet in the long-term real-time video transmission.Meanwhile,the minimum expected transmission delay is obtained under practical cognitive radio networks' conditions,i.e.,secondary user's mobility,PU's random access,imperfect sensing information,etc..Theoretical analysis and simulation results show that the ESH strategy can effectively reduce the transmission delay of video packet in spectrum handoff process. 展开更多
关键词 practical cognitive radio networks spectrum handoff process partially observable Markov decision process video transmission
下载PDF
FEEDBACK CONTROL OPTIMIZATION FOR SEISMICALLY EXCITED BUILDINGS
15
作者 Xueping Li Zuguang Ying 《Acta Mechanica Solida Sinica》 SCIE EI 2007年第4期342-349,共8页
A feedback control optimization method of partially observable linear structures via stationary response is proposed and analyzed with linear building structures equipped with control devices and sensors. First, the p... A feedback control optimization method of partially observable linear structures via stationary response is proposed and analyzed with linear building structures equipped with control devices and sensors. First, the partially observable control problem of the structure under horizontal ground acceleration excitation is converted into a completely observable control problem. Then the It6 stochastic differential equations of the system are derived based on the stochastic averaging method for quasi-integrable Hamiltonian systems and the stationary solution to the Fokker-Plank-Kolmogorov (FPK) equation associated with the It6 equations is obtained. The performance index in terms of the mean system energy and mean square control force is established and the optimal control force is obtained by minimizing the performance index. Finally, the numerical results for a three-story building structure model under E1 Centro, Hachinohe, Northridge and Kobe earthquake excitations are given to illustrate the application and the effectiveness of the proposed method. 展开更多
关键词 feedback control optimization partially observable structure stochastic averagingmethod earthquake response stationary probability density
下载PDF
A navigation method based on POMDP for smart wheelchair in uncertain environments
16
作者 陶永 Wang Tianmiao Wei Hongxing Chen Diansheng 《High Technology Letters》 EI CAS 2010年第2期164-170,共7页
A navigation method based on the partially observable markov decision process (POMDP) for smart wheelchairs in uncertain environments is presented in this paper. The design key factors for the navigation system of a... A navigation method based on the partially observable markov decision process (POMDP) for smart wheelchairs in uncertain environments is presented in this paper. The design key factors for the navigation system of a smart wheelchair are discussed. A kinematics model of the smart wheelchair is given, and the model and principle of POMDP are introduced. In order to respond in uncertain local environments, a novel navigation methodology based on POMDP using the sensors perception and the user's joystick input is presented. The state space, the action set, the observations and the sensor fusion of the navigation method are given in detail, and the optimal policy of the POMDP model is proposed. Experimental results demonstrate the feasibility of this navigation method. Analysis is also conducted to investigate performance evaluation, advantages of the approach and potential generalization of this paper. 展开更多
关键词 service robot smart wheelchair navigation method partially observable markov decision process (POMDP)
下载PDF
MAC Layer Resource Allocation for Wireless Body Area Networks
17
作者 Qinghua Shen Xuemin (Sherman) Shen +1 位作者 Tom H.Luan Jing Liu 《ZTE Communications》 2014年第3期13-21,共9页
Wireless body area networks (WBANs) can provide low-cost, timely healthcare services and are expected to be widely used for e-healthcare in hospitals. In a hospital, space is often limited and multiple WBANs have to... Wireless body area networks (WBANs) can provide low-cost, timely healthcare services and are expected to be widely used for e-healthcare in hospitals. In a hospital, space is often limited and multiple WBANs have to coexist in an area and share the same channel in order to provide healthcare services to different patients. This causes severe interference between WBANs that could significantly reduce the network throughput and increase the amount of power consumed by sensors placed on the body. There-fore, an efficient channel-resource allocation scheme in the medium access control (MAC) layer is crucial. In this paper, we devel-op a centralized MAC layer resource allocation scheme for a WBAN. We focus on mitigating the interference between WBANs and reducing the power consumed by sensors. Channel and buffer state are reported by smartphones deployed in each WBAN, and channel access allocation is performed by a central controller to maximize network throughput. Sensors have strict limitations in terms of energy consumption and computing capability and cannot provide all the necessary information for channel allocation in a timely manner. This deteriorates network performance. We exploit the temporal correlation of the body area channel in order to minimize the number of channel state reports necessary. We view the network design as a partly observable optimization prob-lem and develop a myopic policy, which we then simulate in Matlab. 展开更多
关键词 medium access control (MAC) wireless body area networks (WBANs) resource allocation interference mitigation partially observable optimization
下载PDF
Soft-HGRNs:soft hierarchical graph recurrent networks for multi-agent partially observable environments
18
作者 Yixiang REN Zhenhui YE +2 位作者 Yining CHEN Xiaohong JIANG Guanghua SONG 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2023年第1期117-130,共14页
The recent progress in multi-agent deep reinforcement learning(MADRL)makes it more practical in real-world tasks,but its relatively poor scalability and the partially observable constraint raise more challenges for it... The recent progress in multi-agent deep reinforcement learning(MADRL)makes it more practical in real-world tasks,but its relatively poor scalability and the partially observable constraint raise more challenges for its performance and deployment.Based on our intuitive observation that human society could be regarded as a large-scale partially observable environment,where everyone has the functions of communicating with neighbors and remembering his/her own experience,we propose a novel network structure called the hierarchical graph recurrent network(HGRN)for multi-agent cooperation under partial observability.Specifically,we construct the multiagent system as a graph,use a novel graph convolution structure to achieve communication between heterogeneous neighboring agents,and adopt a recurrent unit to enable agents to record historical information.To encourage exploration and improve robustness,we design a maximum-entropy learning method that can learn stochastic policies of a configurable target action entropy.Based on the above technologies,we propose a value-based MADRL algorithm called Soft-HGRN and its actor-critic variant called SAC-HGRN.Experimental results based on three homogeneous tasks and one heterogeneous environment not only show that our approach achieves clear improvements compared with four MADRL baselines,but also demonstrate the interpretability,scalability,and transferability of the proposed model. 展开更多
关键词 Deep reinforcement learning Graph-based communication Maximum-entropy learning partial observability Heterogeneous settings
原文传递
Adaptive cache policy optimization through deep reinforcement learning in dynamic cellular networks
19
作者 Ashvin Srinivasan Mohsen Amidzadeh +1 位作者 Junshan Zhang Olav Tirkkonen 《Intelligent and Converged Networks》 EI 2024年第2期81-99,共19页
We explore the use of caching both at the network edge and within User Equipment(UE)to alleviate traffic load of wireless networks.We develop a joint cache placement and delivery policy that maximizes the Quality of S... We explore the use of caching both at the network edge and within User Equipment(UE)to alleviate traffic load of wireless networks.We develop a joint cache placement and delivery policy that maximizes the Quality of Service(QoS)while simultaneously minimizing backhaul load and UE power consumption,in the presence of an unknown time-variant file popularity.With file requests in a time slot being affected by download success in the previous slot,the caching system becomes a non-stationary Partial Observable Markov Decision Process(POMDP).We solve the problem in a deep reinforcement learning framework based on the Advantageous Actor-Critic(A2C)algorithm,comparing Feed Forward Neural Networks(FFNN)with a Long Short-Term Memory(LSTM)approach specifically designed to exploit the correlation of file popularity distribution across time slots.Simulation results show that using LSTM-based A2C outperforms FFNN-based A2C in terms of sample efficiency and optimality,demonstrating superior performance for the non-stationary POMDP problem.For caching at the UEs,we provide a distributed algorithm that reaches the objectives dictated by the agent controlling the network,with minimum energy consumption at the UEs,and minimum communication overhead. 展开更多
关键词 wireless caching deep reinforcement learning advantageous actor critic long short term memory non-stationary partial Observable Markov Decision Process(POMDP)
原文传递
THE MAXIMUM PRINCIPLE FOR PARTIALLY OBSERVED OPTIMAL CONTROL OF FORWARD-BACKWARD STOCHASTIC SYSTEMS WITH RANDOM JUMPS 被引量:4
20
作者 Hua XIAO 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2011年第6期1083-1099,共17页
This paper studies the problem of partially observed optimal control for forward-backward stochastic systems which are driven both by Brownian motions and an independent Poisson random measure. Combining forward-backw... This paper studies the problem of partially observed optimal control for forward-backward stochastic systems which are driven both by Brownian motions and an independent Poisson random measure. Combining forward-backward stochastic differential equation theory with certain classical convex variational techniques, the necessary maximum principle is proved for the partially observed optimal control, where the control domain is a nonempty convex set. Under certain convexity assumptions, the author also gives the sufficient conditions of an optimal control for the aforementioned optimal optimal problem. To illustrate the theoretical result, the author also works out an example of partial information linear-quadratic optimal control, and finds an explicit expression of the corresponding optimal control by applying the necessary and sufficient maximum principle. 展开更多
关键词 Forward-backward stochastic differential equations maximum principle partially observed optimal control random jumps.
原文传递
上一页 1 2 下一页 到第
使用帮助 返回顶部