期刊文献+
共找到64篇文章
< 1 2 4 >
每页显示 20 50 100
Robust analysis of discounted Markov decision processes with uncertain transition probabilities 被引量:1
1
作者 LOU Zhen-kai HOU Fu-jun LOU Xu-ming 《Applied Mathematics(A Journal of Chinese Universities)》 SCIE CSCD 2020年第4期417-436,共20页
Optimal policies in Markov decision problems may be quite sensitive with regard to transition probabilities.In practice,some transition probabilities may be uncertain.The goals of the present study are to find the rob... Optimal policies in Markov decision problems may be quite sensitive with regard to transition probabilities.In practice,some transition probabilities may be uncertain.The goals of the present study are to find the robust range for a certain optimal policy and to obtain value intervals of exact transition probabilities.Our research yields powerful contributions for Markov decision processes(MDPs)with uncertain transition probabilities.We first propose a method for estimating unknown transition probabilities based on maximum likelihood.Since the estimation may be far from accurate,and the highest expected total reward of the MDP may be sensitive to these transition probabilities,we analyze the robustness of an optimal policy and propose an approach for robust analysis.After giving the definition of a robust optimal policy with uncertain transition probabilities represented as sets of numbers,we formulate a model to obtain the optimal policy.Finally,we define the value intervals of the exact transition probabilities and construct models to determine the lower and upper bounds.Numerical examples are given to show the practicability of our methods. 展开更多
关键词 Markov decision processes uncertain transition probabilities robustness and sensitivity robust optimal policy value interval
下载PDF
Seeking for Passenger under Dynamic Prices: A Markov Decision Process Approach
2
作者 Qianrong Shen 《Journal of Computer and Communications》 2021年第12期80-97,共18页
In recent years, ride-on-demand (RoD) services such as Uber and Didi are becoming increasingly popular. Different from traditional taxi services, RoD services adopt dynamic pricing mechanisms to manipulate the supply ... In recent years, ride-on-demand (RoD) services such as Uber and Didi are becoming increasingly popular. Different from traditional taxi services, RoD services adopt dynamic pricing mechanisms to manipulate the supply and demand on the road, and such mechanisms improve service capacity and quality. Seeking route recommendation has been widely studied in taxi service. In RoD services, the dynamic price is a new and accurate indicator that represents the supply and demand condition, but it is yet rarely studied in providing clues for drivers to seek for passengers. In this paper, we proposed to incorporate the impacts of dynamic prices as a key factor in recommending seeking routes to drivers. We first showed the importance and need to do that by analyzing real service data. We then designed a Markov Decision Process (MDP) model based on passenger order and car GPS trajectories datasets, and took into account dynamic prices in designing rewards. Results show that our model not only guides drivers to locations with higher prices, but also significantly improves driver revenue. Compared with things with the drivers before using the model, the maximum yield after using it can be increased to 28%. 展开更多
关键词 Ride-on-Demand Service Markov decision process Dynamic Pricing Taxi Services Route Recommendation
下载PDF
Rationale for Decision-Making Processes in Enhancement of Community Participation for Sustainable Mangrove Management in Lamu, Kenya
3
作者 Jamila Ahmed Bessy Kathambi Robert Kibugi 《Open Journal of Ecology》 2023年第6期409-421,共13页
Decision-making is the process of deciding between two or more options in order to take the most appropriate and successful course of action in order to achieve sustainable mangrove management. However, the distinctiv... Decision-making is the process of deciding between two or more options in order to take the most appropriate and successful course of action in order to achieve sustainable mangrove management. However, the distinctiveness of mangrove as an ecosystem, and thus the attendant socio-economic and governance ramifications, causes the idea of decision making to become relatively distinct from other decision making process As a result, the purpose of this research was to evaluate the impact that community engagement plays in the decision-making process as it relates to the establishment of governance norms for sustainable mangrove management in Lamu County. In this study, a correlational research design was applied, and the researchers employed a mixed techniques approach. The target population was 296 respondents. The research used questionnaires and interviews to collect data. A descriptive statistical technique was utilized to perform an inspection and analysis on the data that was gathered. The findings indicated that having awareness about governance standards is beneficial during the process of making decisions. In addition, the findings demonstrated that respondents had the impression that the decision-making process was not done properly. On the other hand, the participants pointed out the positive aspects of the decision-making process and agreed that the participation of both gender was essential for the sustainable management of mangroves. Based on these data, it appeared that full community engagement in decision-making is necessary for sustainable management of mangrove forests. 展开更多
关键词 Community Engagement SUSTAINABILITY decision Making process Lamu
下载PDF
A Comparative Analysis of Visualization Methods in Architecture:Employing Virtual Reality to Support the Decision-Making Process in the Architecture,Engineering,and Construction Industry
4
作者 Ahmed Redha Gheraba Debajyoti Pati +4 位作者 Clifford B.Fedler Marcelo Schmidt Michael S.Molina Ali Nejat Muge Mukaddes Darwish 《Journal of Civil Engineering and Architecture》 2023年第2期73-89,共17页
The design process of the built environment relies on the collaborative effort of all parties involved in the project.During the design phase,owners,end users,and their representatives are expected to make the most cr... The design process of the built environment relies on the collaborative effort of all parties involved in the project.During the design phase,owners,end users,and their representatives are expected to make the most critical design and budgetary decisions-shaping the essential traits of the project,hence emerge the need and necessity to create and integrate mechanisms to support the decision-making process.Design decisions should not be based on assumptions,past experiences,or imagination.An example of the numerous problems that are a result of uninformed design decisions is“change orders”,known as the deviation from the original scope of work,which leads to an increase of the overall cost,and changes to the construction schedule of the project.The long-term aim of this inquiry is to understand the user’s behavior,and establish evidence-based control measures,which are actions and processes that can be implemented in practice to decrease the volume and frequency of the occurrence of change orders.The current study developed a foundation for further examination by proposing potential control measures,and testing their efficiency,such as integrating Virtual Reality(VR).The specific aim was to examine the effect of different visualization methods(i.e.,VR vs.construction drawings)on,(1)how well the subjects understand the information presented about the future/planned environment;(2)the subjects’perceived confidence in what the future environment will look like;(3)the likelihood of changing the built environment;(4)design review time;and(5)accuracy in reviewing and understanding the design. 展开更多
关键词 Virtual reality construction change orders architectural visualization decision making process construction management construction technology interior environmental design
下载PDF
Optimal Policies for Quantum Markov Decision Processes 被引量:2
5
作者 Ming-Sheng Ying Yuan Feng Sheng-Gang Ying 《International Journal of Automation and computing》 EI CSCD 2021年第3期410-421,共12页
Markov decision process(MDP)offers a general framework for modelling sequential decision making where outcomes are random.In particular,it serves as a mathematical framework for reinforcement learning.This paper intro... Markov decision process(MDP)offers a general framework for modelling sequential decision making where outcomes are random.In particular,it serves as a mathematical framework for reinforcement learning.This paper introduces an extension of MDP,namely quantum MDP(q MDP),that can serve as a mathematical model of decision making about quantum systems.We develop dynamic programming algorithms for policy evaluation and finding optimal policies for q MDPs in the case of finite-horizon.The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world. 展开更多
关键词 Quantum Markov decision processes quantum machine learning reinforcement learning dynamic programming decision making
原文传递
Convergence of Markov decision processes with constraints and state-action dependent discount factors 被引量:2
6
作者 Xiao Wu Xianping Guo 《Science China Mathematics》 SCIE CSCD 2020年第1期167-182,共16页
This paper is concerned with the convergence of a sequence of discrete-time Markov decision processes(DTMDPs)with constraints,state-action dependent discount factors,and possibly unbounded costs.Using the convex analy... This paper is concerned with the convergence of a sequence of discrete-time Markov decision processes(DTMDPs)with constraints,state-action dependent discount factors,and possibly unbounded costs.Using the convex analytic approach under mild conditions,we prove that the optimal values and optimal policies of the original DTMDPs converge to those of the"limit"one.Furthermore,we show that any countablestate DTMDP can be approximated by a sequence of finite-state DTMDPs,which are constructed using the truncation technique.Finally,we illustrate the approximation by solving a controlled queueing system numerically,and give the corresponding error bound of the approximation. 展开更多
关键词 discrete-time Markov decision processes state-action dependent discount factors unbounded costs CONVERGENCE
原文传递
A review on Markov Decision Processes 被引量:4
7
作者 J. A. Filar and LIU Ke Centre for Industrial and Applicable Mathematics , University of South Australia , Australia Institute of Applied Mathematics, Chinese Academy of Sciences , Beijing 100080, China 《Chinese Science Bulletin》 SCIE EI CAS 1999年第7期672-672,共1页
MARKOV decision processes (MDPs) have been studied by mathematicians, probabilists, operation researchers and engineers since the late 1950s. In an MDPs a stochastic, dynamic system is controlled by a 'policy'... MARKOV decision processes (MDPs) have been studied by mathematicians, probabilists, operation researchers and engineers since the late 1950s. In an MDPs a stochastic, dynamic system is controlled by a 'policy' selected by a decision-maker/controller, with the goal of maximizing an overall reward function that is an appropriately defined aggregate of immediate rewards, over either finite or infinite time horizon.As such MDPs are a useful paradigm for modeling many processes occurring naturally in the management and engineering contexts.. 展开更多
关键词 A review on Markov decision processes
原文传递
First Passage Risk Probability Minimization for Piecewise Deterministic Markov Decision Processes 被引量:1
8
作者 Xin WEN Hai-feng HUO Xian-ping GUO 《Acta Mathematicae Applicatae Sinica》 SCIE CSCD 2022年第3期549-567,共19页
This paper is an attempt to study the minimization problem of the risk probability of piecewise deterministic Markov decision processes(PDMDPs)with unbounded transition rates and Borel spaces.Different from the expect... This paper is an attempt to study the minimization problem of the risk probability of piecewise deterministic Markov decision processes(PDMDPs)with unbounded transition rates and Borel spaces.Different from the expected discounted and average criteria in the existing literature,we consider the risk probability that the total rewards produced by a system do not exceed a prescribed goal during a first passage time to some target set,and aim to find a policy that minimizes the risk probability over the class of all history-dependent policies.Under suitable conditions,we derive the optimality equation(OE)for the probability criterion,prove that the value function of the minimization problem is the unique solution to the OE,and establish the existence ofε(≥0)-optimal policies.Finally,we provide two examples to illustrate our results. 展开更多
关键词 piecewise deterministic Markov decision processes risk probability first passage time ε-optimal policy
原文传递
SINGULARLY PERTURBED MARKOV DECISION PROCESSES WITH INCLUSION OF TRANSIENT STATES 被引量:1
9
作者 R.H.Liu Q.Zhang G.Yin 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2001年第2期199-211,共13页
This paper is concerned with the continuous-time Markov decision processes (MDP) having weak and strong interactions. Using a hierarchical approach, the state space of the underlying Markov chain can be decomposed int... This paper is concerned with the continuous-time Markov decision processes (MDP) having weak and strong interactions. Using a hierarchical approach, the state space of the underlying Markov chain can be decomposed into several groups of recurrent states and a group of transient states resulting in a singularly perturbed MDP formulation. Instead of solving the original problem directly, a limit problem that is much simpler to handle is derived. On the basis of the optical solution of the limit problem, nearly optimal decisions are constructed for the original problem. The asymptotic optimality of the constructed control is obtained; the rate of convergence is ascertained. 展开更多
关键词 Markov decision process dynamic programming asymptotically optimal control.
原文传递
An average-value-at-risk criterion for Markov decision processes with unbounded costs
10
作者 Qiuli LIU Wai-Ki CHING +1 位作者 Junyu ZHANG Hongchu WANG 《Frontiers of Mathematics in China》 SCIE CSCD 2022年第4期673-687,共15页
We study the Markov decision processes under the average-value-at-risk criterion.The state space and the action space are Borel spaces,the costs are admitted to be unbounded from above,and the discount factors are sta... We study the Markov decision processes under the average-value-at-risk criterion.The state space and the action space are Borel spaces,the costs are admitted to be unbounded from above,and the discount factors are state-action dependent.Under suitable conditions,we establish the existence of optimal deterministic stationary policies.Furthermore,we apply our main results to a cash-balance model. 展开更多
关键词 Markov decision processes average-value-at-risk(AVaR) state-action dependent discount factors optimal policy
原文传递
Meaningful Update and Repair of Markov Decision Processes for Self-Adaptive Systems
11
作者 杨文华 潘敏学 +1 位作者 周宇 黄志球 《Journal of Computer Science & Technology》 SCIE EI CSCD 2022年第1期106-127,共22页
Self-adaptive systems are able to adjust their behaviour in response to environmental condition changes and are widely deployed as Internetwares.Considered as a promising way to handle the ever-growing complexity of s... Self-adaptive systems are able to adjust their behaviour in response to environmental condition changes and are widely deployed as Internetwares.Considered as a promising way to handle the ever-growing complexity of software systems,they have seen an increasing level of interest and are covering a variety of applications,e.g.,autonomous car systems and adaptive network systems.Many approaches for the construction of self-adaptive systems have been developed,and probabilistic models,such as Markov decision processes(MDPs),are one of the favoured.However,the majority of them do not deal with the problems of the underlying MDP being obsolete under new environments or unsatisfactory to the given properties.This results in the generated policies from such MDP failing to guide the self-adaptive system to run correctly and meet goals.In this article,we propose a systematic approach to updating an obsolete MDP by exploring new states and transitions and removing obsolete ones,and repairing an unsatisfactory MDP by adjusting its structure in a more meaningful way rather than arbitrarily changing the transition probabilities to values not in line with reality.Experimental results show that the MDPs updated and repaired by our approach are more competent in guiding the self-adaptive systems’correct running compared with the original ones. 展开更多
关键词 self-adaptive system Markov decision process model repair
原文传递
CONVERGENCE OF CONTROLLED MODELS FOR CONTINUOUS-TIME MARKOV DECISION PROCESSES WITH CONSTRAINED AVERAGE CRITERIA
12
作者 Wenzhao Zhang Xianzhu Xiong 《Annals of Applied Mathematics》 2019年第4期449-464,共16页
This paper attempts to study the convergence of optimal values and optimal policies of continuous-time Markov decision processes(CTMDP for short)under the constrained average criteria. For a given original model M_∞o... This paper attempts to study the convergence of optimal values and optimal policies of continuous-time Markov decision processes(CTMDP for short)under the constrained average criteria. For a given original model M_∞of CTMDP with denumerable states and a sequence {M_n} of CTMDP with finite states, we give a new convergence condition to ensure that the optimal values and optimal policies of {M_n} converge to the optimal value and optimal policy of M_∞as the state space Snof Mnconverges to the state space S_∞of M_∞, respectively. The transition rates and cost/reward functions of M_∞are allowed to be unbounded. Our approach can be viewed as a combination method of linear program and Lagrange multipliers. 展开更多
关键词 continuous-time Markov decision processes optimal value optimal policies constrained average criteria occupation measures
原文传递
Recorded recurrent deep reinforcement learning guidance laws for intercepting endoatmospheric maneuvering missiles
13
作者 Xiaoqi Qiu Peng Lai +1 位作者 Changsheng Gao Wuxing Jing 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2024年第1期457-470,共14页
This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with u... This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with uncertainties and observation noise.The attack-defense engagement scenario is modeled as a partially observable Markov decision process(POMDP).Given the benefits of recurrent neural networks(RNNs)in processing sequence information,an RNN layer is incorporated into the agent’s policy network to alleviate the bottleneck of traditional deep reinforcement learning methods while dealing with POMDPs.The measurements from the interceptor’s seeker during each guidance cycle are combined into one sequence as the input to the policy network since the detection frequency of an interceptor is usually higher than its guidance frequency.During training,the hidden states of the RNN layer in the policy network are recorded to overcome the partially observable problem that this RNN layer causes inside the agent.The training curves show that the proposed RRTD3 successfully enhances data efficiency,training speed,and training stability.The test results confirm the advantages of the RRTD3-based guidance laws over some conventional guidance laws. 展开更多
关键词 Endoatmospheric interception Missile guidance Reinforcement learning Markov decision process Recurrent neural networks
下载PDF
Heterogeneous Network Selection Optimization Algorithm Based on a Markov Decision Model 被引量:4
14
作者 Jianli Xie Wenjuan Gao Cuiran Li 《China Communications》 SCIE CSCD 2020年第2期40-53,共14页
A network selection optimization algorithm based on the Markov decision process(MDP)is proposed so that mobile terminals can always connect to the best wireless network in a heterogeneous network environment.Consideri... A network selection optimization algorithm based on the Markov decision process(MDP)is proposed so that mobile terminals can always connect to the best wireless network in a heterogeneous network environment.Considering the different types of service requirements,the MDP model and its reward function are constructed based on the quality of service(QoS)attribute parameters of the mobile users,and the network attribute weights are calculated by using the analytic hierarchy process(AHP).The network handoff decision condition is designed according to the different types of user services and the time-varying characteristics of the network,and the MDP model is solved by using the genetic algorithm and simulated annealing(GA-SA),thus,users can seamlessly switch to the network with the best long-term expected reward value.Simulation results show that the proposed algorithm has good convergence performance,and can guarantee that users with different service types will obtain satisfactory expected total reward values and have low numbers of network handoffs. 展开更多
关键词 heterogeneous wireless networks Markov decision process reward function genetic algorithm simulated annealing
下载PDF
A dynamical neural network approach for distributionally robust chance-constrained Markov decision process
15
作者 Tian Xia Jia Liu Zhiping Chen 《Science China Mathematics》 SCIE 2024年第6期1395-1418,共24页
In this paper,we study the distributionally robust joint chance-constrained Markov decision process.Utilizing the logarithmic transformation technique,we derive its deterministic reformulation with bi-convex terms und... In this paper,we study the distributionally robust joint chance-constrained Markov decision process.Utilizing the logarithmic transformation technique,we derive its deterministic reformulation with bi-convex terms under the moment-based uncertainty set.To cope with the non-convexity and improve the robustness of the solution,we propose a dynamical neural network approach to solve the reformulated optimization problem.Numerical results on a machine replacement problem demonstrate the efficiency of the proposed dynamical neural network approach when compared with the sequential convex approximation approach. 展开更多
关键词 Markov decision process chance constraints distributionally robust optimization moment-based ambiguity set dynamical neural network
原文传递
Opportunistic admission and resource allocation for slicing enhanced IoT networks
16
作者 Long Zhang Bin Cao Gang Feng 《Digital Communications and Networks》 SCIE CSCD 2023年第6期1465-1476,共12页
Network slicing is envisioned as one of the key techniques to meet the extremely diversified service requirements of the Internet of Things(IoT)as it provides an enhanced user experience and elastic resource configura... Network slicing is envisioned as one of the key techniques to meet the extremely diversified service requirements of the Internet of Things(IoT)as it provides an enhanced user experience and elastic resource configuration.In the context of slicing enhanced IoT networks,both the Service Provider(SP)and Infrastructure Provider(InP)face challenges of ensuring efficient slice construction and high profit in dynamic environments.These challenges arise from randomly generated and departed slice requests from end-users,uncertain resource availability,and multidimensional resource allocation.Admission and resource allocation for distinct demands of slice requests are the key issues in addressing these challenges and should be handled effectively in dynamic environments.To this end,we propose an Opportunistic Admission and Resource allocation(OAR)policy to deal with the issues of random slicing requests,uncertain resource availability,and heterogeneous multi-resources.The key idea of OAR is to allow the SP to decide whether to accept slice requests immediately or defer them according to the load and price of resources.To cope with the random slice requests and uncertain resource availability,we formulated this issue as a Markov Decision Process(MDP)to obtain the optimal admission policy,with the aim of maximizing the system reward.Furthermore,the buyer-seller game theory approach was adopted to realize the optimal resource allocation,while motivating each SP and InP to maximize their rewards.Our numerical results show that the proposed OAR policy can make reasonable decisions effectively and steadily,and outperforms the baseline schemes in terms of the system reward. 展开更多
关键词 SLICE IOT Markov decision process Game theory Admission and resource allocation
下载PDF
Multi-Agent Deep Reinforcement Learning for Cross-Layer Scheduling in Mobile Ad-Hoc Networks
17
作者 Xinxing Zheng Yu Zhao +1 位作者 Joohyun Lee Wei Chen 《China Communications》 SCIE CSCD 2023年第8期78-88,共11页
Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus o... Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus on enabling congestion control to minimize network transmission delays through flexible power control.To effectively solve the congestion problem,we propose a distributed cross-layer scheduling algorithm,which is empowered by graph-based multi-agent deep reinforcement learning.The transmit power is adaptively adjusted in real-time by our algorithm based only on local information(i.e.,channel state information and queue length)and local communication(i.e.,information exchanged with neighbors).Moreover,the training complexity of the algorithm is low due to the regional cooperation based on the graph attention network.In the evaluation,we show that our algorithm can reduce the transmission delay of data flow under severe signal interference and drastically changing channel states,and demonstrate the adaptability and stability in different topologies.The method is general and can be extended to various types of topologies. 展开更多
关键词 Ad-hoc network cross-layer scheduling multi agent deep reinforcement learning interference elimination power control queue scheduling actorcritic methods markov decision process
下载PDF
SBFT:A BFT Consensus Mechanism Based on DQN Algorithm for Industrial Internet of Thing
18
作者 Ningjie Gao Ru Huo +3 位作者 Shuo Wang Jiang Liu Tao Huang Yunjie Liu 《China Communications》 SCIE CSCD 2023年第10期185-199,共15页
With the development and widespread use of blockchain in recent years,many projects have introduced blockchain technology to solve the growing security issues of the Industrial Internet of Things(IIoT).However,due to ... With the development and widespread use of blockchain in recent years,many projects have introduced blockchain technology to solve the growing security issues of the Industrial Internet of Things(IIoT).However,due to the conflict between the operational performance and security of the blockchain system and the compatibility issues with a large number of IIoT devices running together,the mainstream blockchain system cannot be applied to IIoT scenarios.In order to solve these problems,this paper proposes SBFT(Speculative Byzantine Consensus Protocol),a flexible and scalable blockchain consensus mechanism for the Industrial Internet of Things.SBFT has a consensus process based on speculation,improving the throughput and consensus speed of blockchain systems and reducing communication overhead.In order to improve the compatibility and scalability of the blockchain system,we select some nodes to participate in the consensus,and these nodes have better performance in the network.Since multiple properties determine node performance,we abstract the node selection problem as a joint optimization problem and use Dueling Deep Q Learning(DQL)to solve it.Finally,we evaluate the performance of the scheme through simulation,and the simulation results prove the superiority of our scheme. 展开更多
关键词 Industrial Internet of Things Byzantine fault tolerance speculative consensus mechanism Markov decision process deep reinforcement learning
下载PDF
Analysis of a POMDP Model for an Optimal Maintenance Problem with Multiple Imperfect Repairs
19
作者 Nobuyuki Tamura 《American Journal of Operations Research》 2023年第6期133-146,共14页
I consider a system whose deterioration follows a discrete-time and discrete-state Markov chain with an absorbing state. When the system is put into practice, I may select operation (wait), imperfect repair, or replac... I consider a system whose deterioration follows a discrete-time and discrete-state Markov chain with an absorbing state. When the system is put into practice, I may select operation (wait), imperfect repair, or replacement at each discrete-time point. The true state of the system is not known when it is operated. Instead, the system is monitored after operation and some incomplete information concerned with the deterioration is obtained for decision making. Since there are multiple imperfect repairs, I can select one option from them when the imperfect repair is preferable to operation and replacement. To express this situation, I propose a POMDP model and theoretically investigate the structure of an optimal maintenance policy minimizing a total expected discounted cost for an unbounded horizon. Then two stochastic orders are used for the analysis of our problem. 展开更多
关键词 Partially Observable Markov decision process Imperfect Repair Stochastic Order Monotone Property Optimal Maintenance Policy
下载PDF
A Comparison of PPO, TD3 and SAC Reinforcement Algorithms for Quadruped Walking Gait Generation
20
作者 James W. Mock Suresh S. Muknahallipatna 《Journal of Intelligent Learning Systems and Applications》 2023年第1期36-56,共21页
Deep reinforcement learning (deep RL) has the potential to replace classic robotic controllers. State-of-the-art Deep Reinforcement algorithms such as Proximal Policy Optimization, Twin Delayed Deep Deterministic Poli... Deep reinforcement learning (deep RL) has the potential to replace classic robotic controllers. State-of-the-art Deep Reinforcement algorithms such as Proximal Policy Optimization, Twin Delayed Deep Deterministic Policy Gradient and Soft Actor-Critic Reinforcement Algorithms, to mention a few, have been investigated for training robots to walk. However, conflicting performance results of these algorithms have been reported in the literature. In this work, we present the performance analysis of the above three state-of-the-art Deep Reinforcement algorithms for a constant velocity walking task on a quadruped. The performance is analyzed by simulating the walking task of a quadruped equipped with a range of sensors present on a physical quadruped robot. Simulations of the three algorithms across a range of sensor inputs and with domain randomization are performed. The strengths and weaknesses of each algorithm for the given task are discussed. We also identify a set of sensors that contribute to the best performance of each Deep Reinforcement algorithm. 展开更多
关键词 Reinforcement Learning Machine Learning Markov decision process Domain Randomization
下载PDF
上一页 1 2 4 下一页 到第
使用帮助 返回顶部