期刊文献+
共找到7篇文章
< 1 >
每页显示 20 50 100
基于规划步数自适应Dyna-Q的多功能雷达干扰决策方法 被引量:2
1
作者 朱霸坤 朱卫纲 +2 位作者 李伟 李佳芯 杨莹 《兵工自动化》 2022年第7期1-4,共4页
针对基于强化学习的干扰决策方法存在着收敛速度过慢的问题,在Dyna-Q算法的基础上提出一种规划步数自适应的Dyna-Q干扰决策算法。在保证干扰策略有效性的前提下,提升强化学习算法的收敛速度,使算法能以更快的速度学习到最优干扰策略。... 针对基于强化学习的干扰决策方法存在着收敛速度过慢的问题,在Dyna-Q算法的基础上提出一种规划步数自适应的Dyna-Q干扰决策算法。在保证干扰策略有效性的前提下,提升强化学习算法的收敛速度,使算法能以更快的速度学习到最优干扰策略。实验与仿真结果表明:该算法能实现多功能雷达干扰的实时有效,也可扩展到其他强化学习应用领域,具有一定借鉴价值。 展开更多
关键词 多功能雷达 干扰决策 强化学习 dyna-q 自适应
下载PDF
Dyna-QUF:Dyna-Q based univector field navigation for autonomous mobile robots in unknown environments 被引量:1
2
作者 VIET Hoang-huu CHOI Seung-yoon CHUNG Tae-choong 《Journal of Central South University》 SCIE EI CAS 2013年第5期1178-1188,共11页
A novel approach was presented to solve the navigation problem of autonomous mobile robots in unknown environments with dense obstacles based on a univector field method. In an obstacle-free environment, a robot is en... A novel approach was presented to solve the navigation problem of autonomous mobile robots in unknown environments with dense obstacles based on a univector field method. In an obstacle-free environment, a robot is ensured to reach the goal position with the desired posture by following the univector field. Contrariwise, the univector field cannot guarantee that the robot will avoid obstacles in environments. In order to create an intelligent mobile robot being able to perform the obstacle avoidance task while following the univector field, Dyna-Q algorithm is developed to train the robot in learning moving directions to attain a collision-free path for its navigation. Simulations on the computer as well as experiments on the real world prove that the proposed algorithm is efficient for training the robot in reaching the goal position with the desired final orientation. 展开更多
关键词 dyna-q mobile robot reinforcement learning univector field
下载PDF
Extended Dyna-Q Algorithm for Path Planning of Mobile Robots
3
作者 Hoang-huu VIET Sang-hyeok AN Tae-choong CHUNG 《Journal of Measurement Science and Instrumentation》 CAS 2011年第3期283-287,共5页
This paper presents an extended Dyna-Q algorithm to improve efficiency of the standard Dyna-Q algorithm.In the first episodes of the standard Dyna-Q algorithm,the agent travels blindly to find a goal position.To overc... This paper presents an extended Dyna-Q algorithm to improve efficiency of the standard Dyna-Q algorithm.In the first episodes of the standard Dyna-Q algorithm,the agent travels blindly to find a goal position.To overcome this weakness,our approach is to use a maximum likelihood model of all state-action pairs to choose actions and update Q-values in the first few episodes.Our algorithm is compared with one-step Q-learning algorithm and the standard Dyna-Q algorithm for the path planning problem in maze environments.Experimental results show that the proposed algorithm is more efficient than the one-step Q-learning algorithm as well as the standard Dyna-Q algorithm,especially in the large environment of states. 展开更多
关键词 reinforcement learning dyna-q path planning mobile robots
下载PDF
一种基于Dyna-Q学习的旋翼无人机视觉伺服智能控制方法 被引量:7
4
作者 史豪斌 徐梦 +1 位作者 刘珈妤 李继超 《控制与决策》 EI CSCD 北大核心 2019年第12期2517-2526,共10页
基于图像的视觉伺服机器人控制方法通过机器人的视觉获取图像信息,然后形成基于图像信息的闭环反馈来控制机器人的合理运动.经典视觉伺服的伺服增益的选取在大多数条件下是人工赋值的,故存在鲁棒性差、收敛速度慢等问题.针对该问题,提... 基于图像的视觉伺服机器人控制方法通过机器人的视觉获取图像信息,然后形成基于图像信息的闭环反馈来控制机器人的合理运动.经典视觉伺服的伺服增益的选取在大多数条件下是人工赋值的,故存在鲁棒性差、收敛速度慢等问题.针对该问题,提出一种基于Dyna-Q的旋翼无人机视觉伺服智能控制方法调节伺服增益以提高其自适应性.首先,使用基于费尔曼链码的图像特征提取算法提取目标特征点;然后,使用基于图像的视觉伺服形成特征误差的闭环控制;其次,针对旋翼无人机强耦合欠驱动的动力学特性提出一种解耦的视觉伺服控制模型;最后,建立使用Dyna-Q学习调节伺服增益的强化学习模型,通过训练可以使得旋翼无人机自主选择伺服增益.Dyna-Q学习在经典的Q学习的基础上通过建立环境模型来存储经验,环境模型产生的虚拟样本可以作为学习样本来进行值函数的迭代.实验结果表明,所提出的方法相比于传统控制方法PID控制以及经典的基于图像视觉伺服方法具有收敛速度快、稳定性高的优势. 展开更多
关键词 视觉伺服 dyna-q学习 增益调节 旋翼无人机 费尔曼连码 强化学习
原文传递
A Reverse Path Planning Approach for Enhanced Performance of Multi-Degree-of-Freedom Industrial Manipulators
5
作者 Zhiwei Lin Hui Wang +3 位作者 Tianding Chen Yingtao Jiang Jianmei Jiang Yingpin Chen 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第5期1357-1379,共23页
In the domain of autonomous industrial manipulators,precise positioning and appropriate posture selection in path planning are pivotal for tasks involving obstacle avoidance,such as handling,heat sealing,and stacking.... In the domain of autonomous industrial manipulators,precise positioning and appropriate posture selection in path planning are pivotal for tasks involving obstacle avoidance,such as handling,heat sealing,and stacking.While Multi-Degree-of-Freedom(MDOF)manipulators offer kinematic redundancy,aiding in the derivation of optimal inverse kinematic solutions to meet position and posture requisites,their path planning entails intricate multiobjective optimization,encompassing path,posture,and joint motion optimization.Achieving satisfactory results in practical scenarios remains challenging.In response,this study introduces a novel Reverse Path Planning(RPP)methodology tailored for industrial manipulators.The approach commences by conceptualizing the manipulator’s end-effector as an agent within a reinforcement learning(RL)framework,wherein the state space,action set,and reward function are precisely defined to expedite the search for an initial collision-free path.To enhance convergence speed,the Q-learning algorithm in RL is augmented with Dyna-Q.Additionally,we formulate the cylindrical bounding box of the manipulator based on its Denavit-Hartenberg(DH)parameters and propose a swift collision detection technique.Furthermore,the motion performance of the end-effector is refined through a bidirectional search,and joint weighting coefficients are introduced to mitigate motion in high-power joints.The efficacy of the proposed RPP methodology is rigorously examined through extensive simulations conducted on a six-degree-of-freedom(6-DOF)manipulator encountering two distinct obstacle configurations and target positions.Experimental results substantiate that the RPP method adeptly orchestrates the computation of the shortest collision-free path while adhering to specific posture constraints at the target point.Moreover,itminimizes both posture angle deviations and joint motion,showcasing its prowess in enhancing the operational performance of MDOF industrial manipulators. 展开更多
关键词 Reverse path planning dyna-q bidirectional search posture angle joint motion
下载PDF
基于多智能体技术的区域协调优化控制方法
6
作者 张雨晨 《智能城市》 2020年第12期1-3,共3页
为进一步缓解城市交通拥堵问题,设计了基于多智能的区域协调优化控制方法,将信号控制区域划分成路口智能体层与交通子区智能体层.在路口智能体层引入Stackelberg博弈,构建基于Stackelberg-Q的交通子区控制算法;在交通子区层,将Dyna算法... 为进一步缓解城市交通拥堵问题,设计了基于多智能的区域协调优化控制方法,将信号控制区域划分成路口智能体层与交通子区智能体层.在路口智能体层引入Stackelberg博弈,构建基于Stackelberg-Q的交通子区控制算法;在交通子区层,将Dyna算法框架与Q学习相结合提出了基于Dyna-Q的交通子区协调控制算法.采用Synchro交通仿真软件将文章算法与专家系统方案和基于Synchro仿真软件的配时优化方案进行对比验证,选取最大V/C比、延误、服务水平对每个路口进行评价,选取路网区域内车均延误和平均停车次数对整体路网的控制效果进行评价,文章算法均表现出更加良好的控制效果,验证了其算法的有效性与合理性. 展开更多
关键词 区域协调控制算法 Stackelberg-Q dyna-q Synchro仿真
下载PDF
Research on Path Planning of Mobile Robots Based on Dyna-RQ
7
作者 Ziying Zhang Xian Li Yuhua Wang 《国际计算机前沿大会会议论文集》 EI 2023年第1期49-59,共11页
The mobile robot path planning problem is one of the main contents of reinforcement learning research.In traditional reinforcement learning,the agent obtains the cumulative reward value in the process of interacting w... The mobile robot path planning problem is one of the main contents of reinforcement learning research.In traditional reinforcement learning,the agent obtains the cumulative reward value in the process of interacting with the environ-ment andfinally converges to the optimal strategy.The Dyna learning framework in reinforcement learning obtains an estimation model in the real environment.The virtual samples generated by the estimation model are updated together with the empirical samples obtained in the real environment to update the value func-tion or strategy function to improve the convergence efficiency.At present,when reinforcement learning is used for path planning tasks,continuous motion can-not be solved in a large-scale continuous environment,and the convergence is poor.In this paper,we use RBFNN to approximate the Q-value table in the Dyna-Q algorithm to solve the drawbacks in traditional algorithms.The experimental results show that the convergence speed of the improved Dyna-RQ algorithm is significantly faster,which improves the efficiency of mobile robot path planning. 展开更多
关键词 REINFORCEMENT Path planning dyna-q RBFNN
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部