期刊文献+
共找到924篇文章
< 1 2 47 >
每页显示 20 50 100
Modeling and Design of Real-Time Pricing Systems Based on Markov Decision Processes 被引量:4
1
作者 Koichi Kobayashi Ichiro Maruta +1 位作者 Kazunori Sakurama Shun-ichi Azuma 《Applied Mathematics》 2014年第10期1485-1495,共11页
A real-time pricing system of electricity is a system that charges different electricity prices for different hours of the day and for different days, and is effective for reducing the peak and flattening the load cur... A real-time pricing system of electricity is a system that charges different electricity prices for different hours of the day and for different days, and is effective for reducing the peak and flattening the load curve. In this paper, using a Markov decision process (MDP), we propose a modeling method and an optimal control method for real-time pricing systems. First, the outline of real-time pricing systems is explained. Next, a model of a set of customers is derived as a multi-agent MDP. Furthermore, the optimal control problem is formulated, and is reduced to a quadratic programming problem. Finally, a numerical simulation is presented. 展开更多
关键词 markov decision process OPTIMAL Control REAL-TIME PRICING System
下载PDF
Robust analysis of discounted Markov decision processes with uncertain transition probabilities 被引量:2
2
作者 LOU Zhen-kai HOU Fu-jun LOU Xu-ming 《Applied Mathematics(A Journal of Chinese Universities)》 SCIE CSCD 2020年第4期417-436,共20页
Optimal policies in Markov decision problems may be quite sensitive with regard to transition probabilities.In practice,some transition probabilities may be uncertain.The goals of the present study are to find the rob... Optimal policies in Markov decision problems may be quite sensitive with regard to transition probabilities.In practice,some transition probabilities may be uncertain.The goals of the present study are to find the robust range for a certain optimal policy and to obtain value intervals of exact transition probabilities.Our research yields powerful contributions for Markov decision processes(MDPs)with uncertain transition probabilities.We first propose a method for estimating unknown transition probabilities based on maximum likelihood.Since the estimation may be far from accurate,and the highest expected total reward of the MDP may be sensitive to these transition probabilities,we analyze the robustness of an optimal policy and propose an approach for robust analysis.After giving the definition of a robust optimal policy with uncertain transition probabilities represented as sets of numbers,we formulate a model to obtain the optimal policy.Finally,we define the value intervals of the exact transition probabilities and construct models to determine the lower and upper bounds.Numerical examples are given to show the practicability of our methods. 展开更多
关键词 markov decision processes uncertain transition probabilities robustness and sensitivity robust optimal policy value interval
下载PDF
Variance minimization for continuous-time Markov decision processes: two approaches 被引量:1
3
作者 ZHU Quan-xin 《Applied Mathematics(A Journal of Chinese Universities)》 SCIE CSCD 2010年第4期400-410,共11页
This paper studies the limit average variance criterion for continuous-time Markov decision processes in Polish spaces. Based on two approaches, this paper proves not only the existence of solutions to the variance mi... This paper studies the limit average variance criterion for continuous-time Markov decision processes in Polish spaces. Based on two approaches, this paper proves not only the existence of solutions to the variance minimization optimality equation and the existence of a variance minimal policy that is canonical, but also the existence of solutions to the two variance minimization optimality inequalities and the existence of a variance minimal policy which may not be canonical. An example is given to illustrate all of our conditions. 展开更多
关键词 Continuous-time markov decision process Polish space variance minimization optimality equation optimality inequality.
下载PDF
Variance Optimization for Continuous-Time Markov Decision Processes
4
作者 Yaqing Fu 《Open Journal of Statistics》 2019年第2期181-195,共15页
This paper considers the variance optimization problem of average reward in continuous-time Markov decision process (MDP). It is assumed that the state space is countable and the action space is Borel measurable space... This paper considers the variance optimization problem of average reward in continuous-time Markov decision process (MDP). It is assumed that the state space is countable and the action space is Borel measurable space. The main purpose of this paper is to find the policy with the minimal variance in the deterministic stationary policy space. Unlike the traditional Markov decision process, the cost function in the variance criterion will be affected by future actions. To this end, we convert the variance minimization problem into a standard (MDP) by introducing a concept called pseudo-variance. Further, by giving the policy iterative algorithm of pseudo-variance optimization problem, the optimal policy of the original variance optimization problem is derived, and a sufficient condition for the variance optimal policy is given. Finally, we use an example to illustrate the conclusion of this paper. 展开更多
关键词 CONTINUOUS-TIME markov decision process Variance OPTIMALITY of Average REWARD Optimal POLICY of Variance POLICY ITERATION
下载PDF
Adaptive Strategies for Accelerating the Convergence of Average Cost Markov Decision Processes Using a Moving Average Digital Filter
5
作者 Edilson F. Arruda Fabrício Ourique 《American Journal of Operations Research》 2013年第6期514-520,共7页
This paper proposes a technique to accelerate the convergence of the value iteration algorithm applied to discrete average cost Markov decision processes. An adaptive partial information value iteration algorithm is p... This paper proposes a technique to accelerate the convergence of the value iteration algorithm applied to discrete average cost Markov decision processes. An adaptive partial information value iteration algorithm is proposed that updates an increasingly accurate approximate version of the original problem with a view to saving computations at the early iterations, when one is typically far from the optimal solution. The proposed algorithm is compared to classical value iteration for a broad set of adaptive parameters and the results suggest that significant computational savings can be obtained, while also ensuring a robust performance with respect to the parameters. 展开更多
关键词 AVERAGE Cost markov decision processes Value ITERATION Computational EFFORT GRADIENT
下载PDF
Conditional Value-at-Risk for Random Immediate Reward Variables in Markov Decision Processes
6
作者 Masayuki Kageyama Takayuki Fujii +1 位作者 Koji Kanefuji Hiroe Tsubaki 《American Journal of Computational Mathematics》 2011年第3期183-188,共6页
We consider risk minimization problems for Markov decision processes. From a standpoint of making the risk of random reward variable at each time as small as possible, a risk measure is introduced using conditional va... We consider risk minimization problems for Markov decision processes. From a standpoint of making the risk of random reward variable at each time as small as possible, a risk measure is introduced using conditional value-at-risk for random immediate reward variables in Markov decision processes, under whose risk measure criteria the risk-optimal policies are characterized by the optimality equations for the discounted or average case. As an application, the inventory models are considered. 展开更多
关键词 markov decision processes CONDITIONAL VALUE-AT-RISK Risk Optimal Policy INVENTORY Model
下载PDF
Seeking for Passenger under Dynamic Prices: A Markov Decision Process Approach
7
作者 Qianrong Shen 《Journal of Computer and Communications》 2021年第12期80-97,共18页
In recent years, ride-on-demand (RoD) services such as Uber and Didi are becoming increasingly popular. Different from traditional taxi services, RoD services adopt dynamic pricing mechanisms to manipulate the supply ... In recent years, ride-on-demand (RoD) services such as Uber and Didi are becoming increasingly popular. Different from traditional taxi services, RoD services adopt dynamic pricing mechanisms to manipulate the supply and demand on the road, and such mechanisms improve service capacity and quality. Seeking route recommendation has been widely studied in taxi service. In RoD services, the dynamic price is a new and accurate indicator that represents the supply and demand condition, but it is yet rarely studied in providing clues for drivers to seek for passengers. In this paper, we proposed to incorporate the impacts of dynamic prices as a key factor in recommending seeking routes to drivers. We first showed the importance and need to do that by analyzing real service data. We then designed a Markov Decision Process (MDP) model based on passenger order and car GPS trajectories datasets, and took into account dynamic prices in designing rewards. Results show that our model not only guides drivers to locations with higher prices, but also significantly improves driver revenue. Compared with things with the drivers before using the model, the maximum yield after using it can be increased to 28%. 展开更多
关键词 Ride-on-Demand Service markov decision process Dynamic Pricing Taxi Services Route Recommendation
下载PDF
Heterogeneous Network Selection Optimization Algorithm Based on a Markov Decision Model 被引量:8
8
作者 Jianli Xie Wenjuan Gao Cuiran Li 《China Communications》 SCIE CSCD 2020年第2期40-53,共14页
A network selection optimization algorithm based on the Markov decision process(MDP)is proposed so that mobile terminals can always connect to the best wireless network in a heterogeneous network environment.Consideri... A network selection optimization algorithm based on the Markov decision process(MDP)is proposed so that mobile terminals can always connect to the best wireless network in a heterogeneous network environment.Considering the different types of service requirements,the MDP model and its reward function are constructed based on the quality of service(QoS)attribute parameters of the mobile users,and the network attribute weights are calculated by using the analytic hierarchy process(AHP).The network handoff decision condition is designed according to the different types of user services and the time-varying characteristics of the network,and the MDP model is solved by using the genetic algorithm and simulated annealing(GA-SA),thus,users can seamlessly switch to the network with the best long-term expected reward value.Simulation results show that the proposed algorithm has good convergence performance,and can guarantee that users with different service types will obtain satisfactory expected total reward values and have low numbers of network handoffs. 展开更多
关键词 heterogeneous wireless networks markov decision process reward function genetic algorithm simulated annealing
下载PDF
折扣与无折扣MDPs:一个基于SARSA(λ)算法的实例分析
9
作者 陈焕文 谢丽娟 《计算机工程与应用》 CSCD 北大核心 2002年第9期86-88,共3页
分析了折扣激励学习存在的问题,对MDPs的SARSA(λ)算法进行了折扣的比较实验分析,讨论了平均奖赏常量对无折扣SARSA(()算法的影响。
关键词 机器学习 激励学习 SARSA(λ)算法 实例分析 mdps
下载PDF
An Optimized Vertical Handoff Algorithm Based on Markov Process in Vehicle Heterogeneous Network 被引量:4
10
作者 MA Bin DENG Hong +1 位作者 XIE Xianzhong LIAO Xiaofeng 《China Communications》 SCIE CSCD 2015年第4期106-116,共11页
In order to solve the problem the existing vertical handoff algorithms of vehicle heterogeneous wireless network do not consider the diversification of network's status, an optimized vertical handoff algorithm bas... In order to solve the problem the existing vertical handoff algorithms of vehicle heterogeneous wireless network do not consider the diversification of network's status, an optimized vertical handoff algorithm based on markov process is proposed and discussed in this paper. This algorithm takes into account that the status transformation of available network will affect the quality of service(Qo S) of vehicle terminal's communication service. Firstly, Markov process is used to predict the transformation of wireless network's status after the decision via transition probability. Then the weights of evaluating parameters will be determined by fuzzy logic method. Finally, by comparing the total incomes of each wireless network, including handoff decision incomes, handoff execution incomes and communication service incomes after handoff, the optimal network to handoff will be selected. Simulation results show that: the algorithm proposed, compared to the existing algorithm, is able to receive a higher level of load balancing and effectively improves the average blocking rate, packet loss rate and ping-pang effect. 展开更多
关键词 vehicle heterogeneous network vertical handoff markov process fuzzy logic multi-attribute decision
下载PDF
Performance Potential-based Neuro-dynamic Programming for SMDPs 被引量:10
11
作者 TANGHao YUANJi-Bin LUYang CHENGWen-Juan 《自动化学报》 EI CSCD 北大核心 2005年第4期642-645,共4页
An alpha-uniformized Markov chain is defined by the concept of equivalent infinitesimalgenerator for a semi-Markov decision process (SMDP) with both average- and discounted-criteria.According to the relations of their... An alpha-uniformized Markov chain is defined by the concept of equivalent infinitesimalgenerator for a semi-Markov decision process (SMDP) with both average- and discounted-criteria.According to the relations of their performance measures and performance potentials, the optimiza-tion of an SMDP can be realized by simulating the chain. For the critic model of neuro-dynamicprogramming (NDP), a neuro-policy iteration (NPI) algorithm is presented, and the performanceerror bound is shown as there are approximate error and improvement error in each iteration step.The obtained results may be extended to Markov systems, and have much applicability. Finally, anumerical example is provided. 展开更多
关键词 决议过程 Smdp 执行电位 神经动力学 markov 优化设计
下载PDF
Probabilistic Analysis and Multicriteria Decision for Machine Assignment Problem with General Service Times
12
作者 Wang, Jing 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 1994年第1期53-61,共9页
In this paper we carried out a probabilistic analysis for a machine repair system with a general service-time distribution by means of generalized Markov renewal processes. Some formulas for the steady-state performan... In this paper we carried out a probabilistic analysis for a machine repair system with a general service-time distribution by means of generalized Markov renewal processes. Some formulas for the steady-state performance measures. such as the distribution of queue sizes, average queue length, degree of repairman utilization and so on. are then derived. Finally, the machine repair model and a multiple critcria decision-making method are applied to study machine assignment problem with a general service-time distribution to determine the optimum number of machines being serviced by one repairman. 展开更多
关键词 Machine assignment problem Queueing model Multicriteria decision markov processes
下载PDF
A dynamical neural network approach for distributionally robust chance-constrained Markov decision process 被引量:1
13
作者 Tian Xia Jia Liu Zhiping Chen 《Science China Mathematics》 SCIE CSCD 2024年第6期1395-1418,共24页
In this paper,we study the distributionally robust joint chance-constrained Markov decision process.Utilizing the logarithmic transformation technique,we derive its deterministic reformulation with bi-convex terms und... In this paper,we study the distributionally robust joint chance-constrained Markov decision process.Utilizing the logarithmic transformation technique,we derive its deterministic reformulation with bi-convex terms under the moment-based uncertainty set.To cope with the non-convexity and improve the robustness of the solution,we propose a dynamical neural network approach to solve the reformulated optimization problem.Numerical results on a machine replacement problem demonstrate the efficiency of the proposed dynamical neural network approach when compared with the sequential convex approximation approach. 展开更多
关键词 markov decision process chance constraints distributionally robust optimization moment-based ambiguity set dynamical neural network
原文传递
A Novel Dynamic Decision Model in 2-player Symmetric Repeated Games
14
作者 Liu Weibing Wang Xianjia Wang Guangmin 《Engineering Sciences》 EI 2008年第1期43-46,共4页
Considering the dynamic character of repeated games and Markov process, this paper presented a novel dynamic decision model for symmetric repeated games. In this model, players' actions were mapped to a Markov decisi... Considering the dynamic character of repeated games and Markov process, this paper presented a novel dynamic decision model for symmetric repeated games. In this model, players' actions were mapped to a Markov decision process with payoffs, and the Boltzmann distribution was intousluced. Our dynamic model is different from others' , we used this dynamic model to study the iterated prisoner' s dilemma, and the results show that this decision model can successfully be used in symmetric repeated games and has an ability of adaptive learning. 展开更多
关键词 game theory evolutionary game repeated game markov process decision model
下载PDF
基于MDP的无人机避撞航迹规划研究
15
作者 阚煌 辛长范 +3 位作者 谭哲卿 高鑫 史铭姗 张谦 《计算机测量与控制》 2024年第6期292-298,共7页
无人机(UAV)进行避撞前提下的目标搜索航迹规划是指在复杂且众多的环境障碍约束中通过合理规划飞行路径,以更快、更高效的形式找到目标;研究了无障碍环境条件下有限位置马尔科夫移动的规律,构建了相应的马尔科夫移动分布模型;在借鉴搜... 无人机(UAV)进行避撞前提下的目标搜索航迹规划是指在复杂且众多的环境障碍约束中通过合理规划飞行路径,以更快、更高效的形式找到目标;研究了无障碍环境条件下有限位置马尔科夫移动的规律,构建了相应的马尔科夫移动分布模型;在借鉴搜索系统航迹规划的前沿研究成果之上,结合马尔科夫决策过程理论(MDP),引入了负奖励机制对Q-Learning策略算法迭代;类比“风险井”的可视化方式将障碍威胁区域对无人机的负奖励作用直观地呈现出来,构建了复杂障碍约束环境下单无人机目标搜索航迹规划模型,并进行仿真实验证明该算法可行,对航迹规划算法的设计具有一定的参考意义。 展开更多
关键词 无人机 航迹规划 避撞 静态目标搜索 马尔科夫决策过程(mdp) 风险井
下载PDF
基于MDP框架的飞行器隐蔽接敌策略 被引量:11
16
作者 徐安 于雷 +2 位作者 寇英信 徐保伟 李战武 《系统工程与电子技术》 EI CSCD 北大核心 2011年第5期1063-1068,共6页
基于近似动态规划(approximate dynamic programming,ADP)对空战飞行器隐蔽接敌决策问题进行研究。基于作战飞行器的战术使用原则,提出了隐蔽接敌过程中的优势区域与暴露区域;构建了基于马尔科夫决策过程(Markov decision process,MDP)... 基于近似动态规划(approximate dynamic programming,ADP)对空战飞行器隐蔽接敌决策问题进行研究。基于作战飞行器的战术使用原则,提出了隐蔽接敌过程中的优势区域与暴露区域;构建了基于马尔科夫决策过程(Markov decision process,MDP)的隐蔽接敌策略的强化学习方法;通过态势得分函数对非连续的即时收益函数进行修正,给出了基于ADP方法的策略学习与策略提取方法。分别针对对手在有无信息源支持情况下的不同机动对策进行了仿真验证。仿真结果表明,将ADP方法应用于隐蔽接敌策略的学习是可行的,在不同态势下可获得较为有效的接敌策略。 展开更多
关键词 隐蔽接敌 马尔科夫决策过程 近似动态规划 空战决策 近似值函数
下载PDF
基于HMDP的无人机三维路径规划 被引量:8
17
作者 洪晔 房建成 《北京航空航天大学学报》 EI CAS CSCD 北大核心 2009年第1期100-103,共4页
路径规划是UAV(Unmanned Aerial Vehicle)自主飞行的重要保障.初步建立了基于MDP(Markov Decision Processes)的全局路径规划模型,把UAV的路径规划看作是给定环境模型和奖惩原则的情况下,寻求最优策略的问题;为解决算法时空开销大、UAV... 路径规划是UAV(Unmanned Aerial Vehicle)自主飞行的重要保障.初步建立了基于MDP(Markov Decision Processes)的全局路径规划模型,把UAV的路径规划看作是给定环境模型和奖惩原则的情况下,寻求最优策略的问题;为解决算法时空开销大、UAV航向改变频繁的缺点,提出一种基于状态聚类方法的HMDP(Hierarchical Markov Decision Processes)模型,并将其拓展到三维规划中.仿真实验证明:这种简单的规划模型可以有效解决UAV的三维全局路径规划问题,为其在实际飞行中的局部规划奠定了基础. 展开更多
关键词 无人机(UAV) 路径规划 马尔可夫决策过程(mdp) 分层马尔可夫决策过程(Hmdp) 仿真
下载PDF
基于POMDP的不稳定心绞痛中西医结合治疗方案优化研究 被引量:14
18
作者 冯妍 徐浩 +2 位作者 刘凯 周雪忠 陈可冀 《中国中西医结合杂志》 CAS CSCD 北大核心 2013年第7期878-882,共5页
目的初步优化中西医结合防治不稳定心绞痛(unstable angina,UA)的综合治疗方案。方法基于部分可观察的马尔科夫决策过程模型(Partially Observable Markov Decision Process,POMDP)的方法,选择气虚、血瘀、痰浊3个主要证侯要素,对UA住... 目的初步优化中西医结合防治不稳定心绞痛(unstable angina,UA)的综合治疗方案。方法基于部分可观察的马尔科夫决策过程模型(Partially Observable Markov Decision Process,POMDP)的方法,选择气虚、血瘀、痰浊3个主要证侯要素,对UA住院患者的诊治情况进行深层次数据挖掘、分析,客观评价UA中西医结合的疗效。结果 UA气虚证、血瘀证、痰浊证患者的推荐治疗方案依次为:硝酸酯类+他汀类+氯吡格雷+血管紧张素Ⅱ受体阻滞剂+肝素类+黄芪+党参+茯苓+白术(ADR=0.85077869);硝酸酯类+阿司匹林+氯吡格雷+他汀类+肝素类+当归+红花+桃仁+赤芍(ADR=0.70773000);硝酸酯类+阿司匹林+他汀类+血管紧张素转换酶抑制剂+栝蒌+薤白+半夏+陈皮(ADR=0.72509600)。结论本研究基于POMDP优化了UA的治疗方案,可作为进一步规范和制定中西医结合治疗UA方案的参考。 展开更多
关键词 部分可观察马尔科夫决策过程 不稳定心绞痛 治疗方案优化
下载PDF
基于MDP随机路径模拟的电动汽车充电负荷时空分布预测 被引量:56
19
作者 张谦 王众 +2 位作者 谭维玉 刘桦臻 李晨 《电力系统自动化》 EI CSCD 北大核心 2018年第20期59-66,共8页
针对电动汽车时空转移随机性的问题,计及实时交通与温度,提出了一种基于马尔可夫决策过程随机路径模拟的城市电动汽车充电负荷时空分布预测方法。首先,根据各类车型充电方式与出行特点对各类电动汽车进行分类;其次,根据蒙特卡洛方法建... 针对电动汽车时空转移随机性的问题,计及实时交通与温度,提出了一种基于马尔可夫决策过程随机路径模拟的城市电动汽车充电负荷时空分布预测方法。首先,根据各类车型充电方式与出行特点对各类电动汽车进行分类;其次,根据蒙特卡洛方法建立各类电动汽车的时空转移模型,采用马尔可夫决策理论对出行路径进行实时动态随机模拟;根据电动汽车实测数据建立温度、交通能耗模型,计算得到实时单位里程耗电量。最后,以某典型城区为例,对不同温度、不同交通状况下电动汽车区域充电负荷进行计算。仿真结果表明,区域内快充负荷较大的节点充电波动性较大,环境温度升高或交通拥堵状况恶化会导致充电负荷高峰的持续时间增高。 展开更多
关键词 电动汽车 时空分布 马尔可夫决策过程 随机路径模拟 充电负荷
下载PDF
基于POMDP的信道感知接入算法 被引量:2
20
作者 郭文慧 王亚林 韩迎鸽 《计算机工程与应用》 CSCD 2014年第5期203-207,共5页
在认知无线电中,为了最大化次用户的吞吐量,同时对主用户的干扰低于预定值,提出一种基于POMDP的信道感知接入算法。次用户将主用户信道在时间轴上细分成等间隔的时隙,在每个时隙开始时,次用户从频谱感知、以较高的功率接入信道和以较低... 在认知无线电中,为了最大化次用户的吞吐量,同时对主用户的干扰低于预定值,提出一种基于POMDP的信道感知接入算法。次用户将主用户信道在时间轴上细分成等间隔的时隙,在每个时隙开始时,次用户从频谱感知、以较高的功率接入信道和以较低的功率接入信道三种可选策略中选择最优的策略。将次用户的选择过程建模成一个POMDP问题,并采用一些相应的最优策略求解。计算机仿真结果验证了算法的有效性。 展开更多
关键词 认知无线电 频谱感知 吞吐量 半马尔科夫链 PARTIALLY OBSERVABLE markov decision process(POmdp)
下载PDF
上一页 1 2 47 下一页 到第
使用帮助 返回顶部