This paper considers the variance optimization problem of average reward in continuous-time Markov decision process (MDP). It is assumed that the state space is countable and the action space is Borel measurable space...This paper considers the variance optimization problem of average reward in continuous-time Markov decision process (MDP). It is assumed that the state space is countable and the action space is Borel measurable space. The main purpose of this paper is to find the policy with the minimal variance in the deterministic stationary policy space. Unlike the traditional Markov decision process, the cost function in the variance criterion will be affected by future actions. To this end, we convert the variance minimization problem into a standard (MDP) by introducing a concept called pseudo-variance. Further, by giving the policy iterative algorithm of pseudo-variance optimization problem, the optimal policy of the original variance optimization problem is derived, and a sufficient condition for the variance optimal policy is given. Finally, we use an example to illustrate the conclusion of this paper.展开更多
文摘This paper considers the variance optimization problem of average reward in continuous-time Markov decision process (MDP). It is assumed that the state space is countable and the action space is Borel measurable space. The main purpose of this paper is to find the policy with the minimal variance in the deterministic stationary policy space. Unlike the traditional Markov decision process, the cost function in the variance criterion will be affected by future actions. To this end, we convert the variance minimization problem into a standard (MDP) by introducing a concept called pseudo-variance. Further, by giving the policy iterative algorithm of pseudo-variance optimization problem, the optimal policy of the original variance optimization problem is derived, and a sufficient condition for the variance optimal policy is given. Finally, we use an example to illustrate the conclusion of this paper.
基金This work was partially funded by the Key R&D Programs of Shandong Province,China(Grant Nos.2018CXGC1411 and 2021CXGC010514).
文摘Cuproptosis shows enormous application prospects in lung metastasis treatment.However,the glycolysis,Cu^(+)efflux mechanisms,and insufficient lung drug accumulation severely restrict cuproptosis efficacy.Herein,an inhalable poly(2-(N-oxide-N,N-diethylamino)ethyl methacrylate)(OPDEA)-coated copper-based metal–organic framework encapsulating pyruvate dehydrogenase kinase 1 siRNA(siPDK)is constructed for mediating cuproptosis and subsequently promoting lung metastasis immunotherapy,namely OMP.After inhalation,OMP shows highly efficient lung accumulation and long-term retention,ascribing to the OPDEA-mediated pulmonary mucosa penetration.Within tumor cells,OMP is degraded to release Cu2+under acidic condition,which will be reduced to toxic Cu^(+)to induce cuproptosis under glutathione(GSH)regulation.Meanwhile,siPDK released from OMP inhibits intracellular glycolysis and adenosine-5ʹ-triphosphate(ATP)production,then blocking the Cu^(+)efflux protein ATP7B,thereby rendering tumor cells more sensitive to OMP-mediated cuproptosis.Moreover,OMP-mediated cuproptosis triggers immunogenic cell death(ICD)to promote dendritic cells(DCs)maturation and CD8^(+)T cells infiltration.Notably,OMP-induced cuproptosis up-regulates membrane-associated programmed cell death-ligand 1(PD-L1)expression and induces soluble PD-L1 secretion,and thus synergizes with anti-PD-L1 antibodies(aPD-L1)to reprogram immunosuppressive tumor microenvironment,finally yielding improved immunotherapy efficacy.Overall,OMP may serve as an efficient inhalable nanoplatform and afford preferable efficacy against lung metastasis through inducing cuproptosis and combining with aPD-L1.