期刊文献+
共找到7篇文章
< 1 >
每页显示 20 50 100
Recognition and interfere deceptive behavior based on inverse reinforcement learning and game theory
1
作者 ZENG Yunxiu XU Kai 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2023年第2期270-288,共19页
In real-time strategy(RTS)games,the ability of recognizing other players’goals is important for creating artifical intelligence(AI)players.However,most current goal recognition methods do not take the player’s decep... In real-time strategy(RTS)games,the ability of recognizing other players’goals is important for creating artifical intelligence(AI)players.However,most current goal recognition methods do not take the player’s deceptive behavior into account which often occurs in RTS game scenarios,resulting in poor recognition results.In order to solve this problem,this paper proposes goal recognition for deceptive agent,which is an extended goal recognition method applying the deductive reason method(from general to special)to model the deceptive agent’s behavioral strategy.First of all,the general deceptive behavior model is proposed to abstract features of deception,and then these features are applied to construct a behavior strategy that best matches the deceiver’s historical behavior data by the inverse reinforcement learning(IRL)method.Final,to interfere with the deceptive behavior implementation,we construct a game model to describe the confrontation scenario and the most effective interference measures. 展开更多
关键词 deceptive path planning inverse reinforcement learning(IRL) game theory goal recognition
下载PDF
Modified reward function on abstract features in inverse reinforcement learning 被引量:1
2
作者 Shen-yi CHEN Hui QIAN Jia FAN Zhuo-jun JIN Miao-liang ZHU 《Journal of Zhejiang University-Science C(Computers and Electronics)》 SCIE EI 2010年第9期718-723,共6页
We improve inverse reinforcement learning(IRL) by applying dimension reduction methods to automatically extract Abstract features from human-demonstrated policies,to deal with the cases where features are either unkno... We improve inverse reinforcement learning(IRL) by applying dimension reduction methods to automatically extract Abstract features from human-demonstrated policies,to deal with the cases where features are either unknown or numerous.The importance rating of each abstract feature is incorporated into the reward function.Simulation is performed on a task of driving in a five-lane highway,where the controlled car has the largest fixed speed among all the cars.Performance is almost 10.6% better on average with than without importance ratings. 展开更多
关键词 Importance rating Abstract feature Feature extraction inverse reinforcement learning(IRL) Markov decision process(MDP)
原文传递
Convergence analysis of an incremental approach to online inverse reinforcement learning
3
作者 Zhuo-jun JIN Hui QIAN Shen-yi CHEN Miao-liang ZHU 《Journal of Zhejiang University-Science C(Computers and Electronics)》 SCIE EI 2011年第1期17-24,共8页
Interest in inverse reinforcement learning (IRL) has recently increased,that is,interest in the problem of recovering the reward function underlying a Markov decision process (MDP) given the dynamics of the system and... Interest in inverse reinforcement learning (IRL) has recently increased,that is,interest in the problem of recovering the reward function underlying a Markov decision process (MDP) given the dynamics of the system and the behavior of an expert.This paper deals with an incremental approach to online IRL.First,the convergence property of the incremental method for the IRL problem was investigated,and the bounds of both the mistake number during the learning process and regret were provided by using a detailed proof.Then an online algorithm based on incremental error correcting was derived to deal with the IRL problem.The key idea is to add an increment to the current reward estimate each time an action mismatch occurs.This leads to an estimate that approaches a target optimal value.The proposed method was tested in a driving simulation experiment and found to be able to efficiently recover an adequate reward function. 展开更多
关键词 Incremental approach Reward recovering Online learning inverse reinforcement learning Markov decision process
原文传递
A survey of inverse reinforcement learning techniques
4
作者 Shao Zhifei Er Meng Joo 《International Journal of Intelligent Computing and Cybernetics》 EI 2012年第3期293-311,共19页
Purpose-This purpose of this paper is to provide an overview of the theoretical background and applications of inverse reinforcement learning(IRL).Design/methodology/approach-Reinforcement learning(RL)techniques provi... Purpose-This purpose of this paper is to provide an overview of the theoretical background and applications of inverse reinforcement learning(IRL).Design/methodology/approach-Reinforcement learning(RL)techniques provide a powerful solution for sequential decision making problems under uncertainty.RL uses an agent equipped with a reward function to find a policy through interactions with a dynamic environment.However,one major assumption of existing RL algorithms is that reward function,the most succinct representation of the designer’s intention,needs to be provided beforehand.In practice,the reward function can be very hard to specify and exhaustive to tune for large and complex problems,and this inspires the development of IRL,an extension of RL,which directly tackles this problem by learning the reward function through expert demonstrations.In this paper,the original IRL algorithms and its close variants,as well as their recent advances are reviewed and compared.Findings-This paper can serve as an introduction guide of fundamental theory and developments,as well as the applications of IRL.Originality/value-This paper surveys the theories and applications of IRL,which is the latest development of RL and has not been done so far. 展开更多
关键词 inverse reinforcement learning Reward function reinforcement learning Artificial intelligence learning methods
原文传递
Heterogeneous multi-player imitation learning
5
作者 Bosen Lian Wenqian Xue Frank L.Lewis 《Control Theory and Technology》 EI CSCD 2023年第3期281-291,共11页
This paper studies imitation learning in nonlinear multi-player game systems with heterogeneous control input dynamics.We propose a model-free data-driven inverse reinforcement learning(RL)algorithm for a leaner to fi... This paper studies imitation learning in nonlinear multi-player game systems with heterogeneous control input dynamics.We propose a model-free data-driven inverse reinforcement learning(RL)algorithm for a leaner to find the cost functions of a N-player Nash expert system given the expert's states and control inputs.This allows us to address the imitation learning problem without prior knowledge of the expert's system dynamics.To achieve this,we provide a basic model-based algorithm that is built upon RL and inverse optimal control.This serves as the foundation for our final model-free inverse RL algorithm which is implemented via neural network-based value function approximators.Theoretical analysis and simulation examples verify the methods. 展开更多
关键词 Imitation learning inverse reinforcement learning Heterogeneous multi-player games Data-driven model-free control
原文传递
AInvR:Adaptive Learning Rewards for Knowledge Graph Reasoning Using Agent Trajectories
6
作者 Hao Zhang Guoming Lu +1 位作者 Ke Qin Kai Du 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2023年第6期1101-1114,共14页
Multi-hop reasoning for incomplete Knowledge Graphs(KGs)demonstrates excellent interpretability with decent performance.Reinforcement Learning(RL)based approaches formulate multi-hop reasoning as a typical sequential ... Multi-hop reasoning for incomplete Knowledge Graphs(KGs)demonstrates excellent interpretability with decent performance.Reinforcement Learning(RL)based approaches formulate multi-hop reasoning as a typical sequential decision problem.An intractable shortcoming of multi-hop reasoning with RL is that sparse reward signals make performance unstable.Current mainstream methods apply heuristic reward functions to counter this challenge.However,the inaccurate rewards caused by heuristic functions guide the agent to improper inference paths and unrelated object entities.To this end,we propose a novel adaptive Inverse Reinforcement Learning(IRL)framework for multi-hop reasoning,called AInvR.(1)To counter the missing and spurious paths,we replace the heuristic rule rewards with an adaptive rule reward learning mechanism based on agent’s inference trajectories;(2)to alleviate the impact of over-rewarded object entities misled by inaccurate reward shaping and rules,we propose an adaptive negative hit reward learning mechanism based on agent’s sampling strategy;(3)to further explore diverse paths and mitigate the influence of missing facts,we design a reward dropout mechanism to randomly mask and perturb reward parameters for the reward learning process.Experimental results on several benchmark knowledge graphs demonstrate that our method is more effective than existing multi-hop approaches. 展开更多
关键词 Knowledge Graph Reasoning(KGR) inverse reinforcement learning(IRL) multi-hop reasoning
原文传递
Robot learning from demonstration for path planning: A review 被引量:6
7
作者 XIE ZongWu ZHANG Qi +1 位作者 JIANG ZaiNan LIU Hong 《Science China(Technological Sciences)》 SCIE EI CAS CSCD 2020年第8期1325-1334,共10页
Learning from demonstration(LfD)is an appealing method of helping robots learn new skills.Numerous papers have presented methods of LfD with good performance in robotics.However,complicated robot tasks that need to ca... Learning from demonstration(LfD)is an appealing method of helping robots learn new skills.Numerous papers have presented methods of LfD with good performance in robotics.However,complicated robot tasks that need to carefully regulate path planning strategies remain unanswered.Contact or non-contact constraints in specific robot tasks make the path planning problem more difficult,as the interaction between the robot and the environment is time-varying.In this paper,we focus on the path planning of complex robot tasks in the domain of LfD and give a novel perspective for classifying imitation learning and inverse reinforcement learning.This classification is based on constraints and obstacle avoidance.Finally,we summarize these methods and present promising directions for robot application and LfD theory. 展开更多
关键词 learning from demonstration path planning imitation learning inverse reinforcement learning obstacle avoidance
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部