期刊文献+
共找到252篇文章
< 1 2 13 >
每页显示 20 50 100
Deep Imitation Learning for Autonomous Vehicles Based on Convolutional Neural Networks 被引量:10
1
作者 Parham M.Kebria Abbas Khosravi +1 位作者 Syed Moshfeq Salaken Saeid Nahavandi 《IEEE/CAA Journal of Automatica Sinica》 EI CSCD 2020年第1期82-95,共14页
Providing autonomous systems with an effective quantity and quality of information from a desired task is challenging. In particular, autonomous vehicles, must have a reliable vision of their workspace to robustly acc... Providing autonomous systems with an effective quantity and quality of information from a desired task is challenging. In particular, autonomous vehicles, must have a reliable vision of their workspace to robustly accomplish driving functions. Speaking of machine vision, deep learning techniques, and specifically convolutional neural networks, have been proven to be the state of the art technology in the field. As these networks typically involve millions of parameters and elements, designing an optimal architecture for deep learning structures is a difficult task which is globally under investigation by researchers. This study experimentally evaluates the impact of three major architectural properties of convolutional networks, including the number of layers, filters, and filter size on their performance. In this study, several models with different properties are developed,equally trained, and then applied to an autonomous car in a realistic simulation environment. A new ensemble approach is also proposed to calculate and update weights for the models regarding their mean squared error values. Based on design properties,performance results are reported and compared for further investigations. Surprisingly, the number of filters itself does not largely affect the performance efficiency. As a result, proper allocation of filters with different kernel sizes through the layers introduces a considerable improvement in the performance.Achievements of this study will provide the researchers with a clear clue and direction in designing optimal network architectures for deep learning purposes. 展开更多
关键词 Autonomous vehicles convolutional neural networks deep learning imitation learning
下载PDF
Target-driven visual navigation in indoor scenes using reinforcement learning and imitation learning 被引量:7
2
作者 Qiang Fang Xin Xu +1 位作者 Xitong Wang Yujun Zeng 《CAAI Transactions on Intelligence Technology》 SCIE EI 2022年第2期167-176,共10页
Here,the challenges of sample efficiency and navigation performance in deep rein-forcement learning for visual navigation are focused and a deep imitation reinforcement learning approach is proposed.Our contributions ... Here,the challenges of sample efficiency and navigation performance in deep rein-forcement learning for visual navigation are focused and a deep imitation reinforcement learning approach is proposed.Our contributions are mainly three folds:first,a frame-work combining imitation learning with deep reinforcement learning is presented,which enables a robot to learn a stable navigation policy faster in the target-driven navigation task.Second,the surrounding images is taken as the observation instead of sequential images,which can improve the navigation performance for more information.Moreover,a simple yet efficient template matching method is adopted to determine the stop action,making the system more practical.Simulation experiments in the AI-THOR environment show that the proposed approach outperforms previous end-to-end deep reinforcement learning approaches,which demonstrate the effectiveness and efficiency of our approach. 展开更多
关键词 imitation learning VISUAL
下载PDF
NOMA-Based Energy-Efficient Task Scheduling in Vehicular Edge Computing Networks: A Self-Imitation Learning-Based Approach 被引量:8
3
作者 Peiran Dong Zhaolong Ning +3 位作者 Rong Ma Xiaojie Wang Xiping Hu Bin Hu 《China Communications》 SCIE CSCD 2020年第11期1-11,共11页
Mobile Edge Computing(MEC)is promising to alleviate the computation and storage burdens for terminals in wireless networks.The huge energy consumption of MEC servers challenges the establishment of smart cities and th... Mobile Edge Computing(MEC)is promising to alleviate the computation and storage burdens for terminals in wireless networks.The huge energy consumption of MEC servers challenges the establishment of smart cities and their service time powered by rechargeable batteries.In addition,Orthogonal Multiple Access(OMA)technique cannot utilize limited spectrum resources fully and efficiently.Therefore,Non-Orthogonal Multiple Access(NOMA)-based energy-efficient task scheduling among MEC servers for delay-constraint mobile applications is important,especially in highly-dynamic vehicular edge computing networks.The various movement patterns of vehicles lead to unbalanced offloading requirements and different load pressure for MEC servers.Self-Imitation Learning(SIL)-based Deep Reinforcement Learning(DRL)has emerged as a promising machine learning technique to break through obstacles in various research fields,especially in time-varying networks.In this paper,we first introduce related MEC technologies in vehicular networks.Then,we propose an energy-efficient approach for task scheduling in vehicular edge computing networks based on DRL,with the purpose of both guaranteeing the task latency requirement for multiple users and minimizing total energy consumption of MEC servers.Numerical results demonstrate that the proposed algorithm outperforms other methods. 展开更多
关键词 NOMA energy-efficient scheduling vehicular edge computing imitation learning
下载PDF
Creative Imitation: An Answer to the Fundamental Issue of L2 Learning 被引量:4
4
作者 Xiao ZHOU 《Chinese Journal of Applied Linguistics》 2021年第3期351-365,431,共16页
This paper reports on a study on the effects of reading-writing integrated tasks on vocabulary learning and explored the differential roles of creative construction and non-creative construction in promoting lexical l... This paper reports on a study on the effects of reading-writing integrated tasks on vocabulary learning and explored the differential roles of creative construction and non-creative construction in promoting lexical learning. Participants were 90 first-year English majors, randomly assigned to two experimental groups(continuation and retelling) and one control group, with 30 students in each group. Results showed that the continuation group generated a substantial amount of creative construction and produced significantly more instances of creative imitation than the retelling group. The continuation group outperformed the retelling group for both receptive and productive vocabulary knowledge gain and retention, but differences were only significant in terms of productive vocabulary retention. Finally, productive vocabulary knowledge retention among the continuation group was significantly and positively correlated with creative imitation(meaning creation coupled with language imitation), but not with linguistic alignment per se. As productive vocabulary knowledge constitutes the learner ’s ability to use lexical knowledge to express ideas in dynamic contexts, the findings afforded evidence that creative imitation could be the answer to the fundamental issue of L2 learning(i.e., mapping static language onto dynamic idea expression). The pedagogical implications as well as future research directions are also discussed. 展开更多
关键词 creative imitation non-creative construction CONTINUATION RETELLING lexical learning
下载PDF
Human Skeleton Detection, Modeling and Gesture Imitation Learning for a Social Purpose
5
作者 Linda Nanan Vallée Sao Mai Nguyen +2 位作者 Christophe Lohr Ioannis Kanellos Olivier Asseu 《Engineering(科研)》 2020年第2期90-98,共9页
Gesture recognition is topical in computer science and aims at interpreting human gestures via mathematical algorithms. Among the numerous applications are physical rehabilitation and imitation games. In this work, we... Gesture recognition is topical in computer science and aims at interpreting human gestures via mathematical algorithms. Among the numerous applications are physical rehabilitation and imitation games. In this work, we suggest performing human gesture recognition within the context of a serious imitation game, which would aim at improving social interactions with teenagers with autism spectrum disorders. We use an artificial intelligence algorithm to detect the skeleton of the participant, then model the human pose space and describe an imitation learning method using a Gaussian Mixture Model in the Riemannian manifold. 展开更多
关键词 imitation learning Artificial Intelligence GESTURE Recognition AUTISM Spectrum DISORDERS (ASD) Gaussian Mixture Model (GMM)
下载PDF
Learning to Branch in Combinatorial Optimization With Graph Pointer Networks
6
作者 Rui Wang Zhiming Zhou +4 位作者 Kaiwen Li Tao Zhang Ling Wang Xin Xu Xiangke Liao 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第1期157-169,共13页
Traditional expert-designed branching rules in branch-and-bound(B&B) are static, often failing to adapt to diverse and evolving problem instances. Crafting these rules is labor-intensive, and may not scale well wi... Traditional expert-designed branching rules in branch-and-bound(B&B) are static, often failing to adapt to diverse and evolving problem instances. Crafting these rules is labor-intensive, and may not scale well with complex problems.Given the frequent need to solve varied combinatorial optimization problems, leveraging statistical learning to auto-tune B&B algorithms for specific problem classes becomes attractive. This paper proposes a graph pointer network model to learn the branch rules. Graph features, global features and historical features are designated to represent the solver state. The graph neural network processes graph features, while the pointer mechanism assimilates the global and historical features to finally determine the variable on which to branch. The model is trained to imitate the expert strong branching rule by a tailored top-k Kullback-Leibler divergence loss function. Experiments on a series of benchmark problems demonstrate that the proposed approach significantly outperforms the widely used expert-designed branching rules. It also outperforms state-of-the-art machine-learning-based branch-and-bound methods in terms of solving speed and search tree size on all the test instances. In addition, the model can generalize to unseen instances and scale to larger instances. 展开更多
关键词 Branch-and-bound(B&B) combinatorial optimization deep learning graph neural network imitation learning
下载PDF
基于one-shot learning的人脸识别研究
7
作者 程远航 余军 《现代电子技术》 2021年第19期76-80,共5页
由于在特殊场景下大量标注人脸数据样本识别时需要大量带有身份标记的训练样本,且无法精准提取小样本特征,故提出单样本学习(one-shot learning)的人脸识别算法。选取并赋值单样本人脸图像像素点中间值,保存至缓冲区进行遍历,利用Siames... 由于在特殊场景下大量标注人脸数据样本识别时需要大量带有身份标记的训练样本,且无法精准提取小样本特征,故提出单样本学习(one-shot learning)的人脸识别算法。选取并赋值单样本人脸图像像素点中间值,保存至缓冲区进行遍历,利用Siamese Network模型计算遍历结果共享权重,利用共享权值识别图像特征相似性,得到人脸识别结果。结果表明,与基于卷积神经网络的人脸识别方法相比,所研究方法识别准确率达到95.68%,识别效率达到354.25 s,结果更好。由此说明所研究方法在小样本的情况下也能更为快速且准确地完成人脸识别任务。 展开更多
关键词 人脸识别 one-shot learning 共享权值 Siamese Network模型 图像处理 对比分析
下载PDF
Imitation Learning Based Real-time Decision-making of Microgrid Economic Dispatch Under Multiple Uncertainties
8
作者 Wei Dong Fan Zhang +2 位作者 Meng Li Xiaolun Fang Qiang Yang 《Journal of Modern Power Systems and Clean Energy》 SCIE EI CSCD 2024年第4期1183-1193,共11页
The intermittency of renewable energy generation,variability of load demand,and stochasticity of market price bring about direct challenges to optimal energy management of microgrids.To cope with these different forms... The intermittency of renewable energy generation,variability of load demand,and stochasticity of market price bring about direct challenges to optimal energy management of microgrids.To cope with these different forms of operation uncertainties,an imitation learning based real-time decision-mak-ing solution for microgrid economic dispatch is proposed.In this solution,the optimal dispatch trajectories obtained by solving the optimal problem using historical deterministic operation patterns are demonstrated as the expert samples for imitation learning.To improve the generalization performance of imitation learning and the expressive ability of uncertain variables,a hybrid model combining the unsupervised and supervised learning is utilized.The denoising autoencoder based unsupervised learning model is adopted to enhance the feature extraction of operation patterns.Furthermore,the long short-term memory network based supervised learning model is used to efficiently characterize the mapping between the input space composed of the extracted operation patterns and system state variables and the output space composed of the optimal dispatch trajectories.The numerical simulation results demonstrate that under various operation uncertainties,the operation cost achieved by the proposed solution is close to the minimum theoretical value.Compared with the traditional model predictive control method and basic clone imitation learning method,the operation cost of the proposed solution is reduced by 6.3% and 2.8%,respectively,overa test period of three months. 展开更多
关键词 Energy management imitation learning datadriven decision economic dispatch
原文传递
Joint Entity and Event Extraction with Generative Adversarial Imitation Learning 被引量:11
9
作者 Tongtao Zhang Heng Ji Avirup Sil 《Data Intelligence》 2019年第2期99-120,共22页
We propose a new framework for entity and event extraction based on generative adversarial imitation learning-an inverse reinforcement learning method using a generative adversarial network(GAN).We assume that instanc... We propose a new framework for entity and event extraction based on generative adversarial imitation learning-an inverse reinforcement learning method using a generative adversarial network(GAN).We assume that instances and labels yield to various extents of difficulty and the gains and penalties(rewards)are expected to be diverse.We utilize discriminators to estimate proper rewards according to the difference between the labels committed by the ground-truth(expert)and the extractor(agent).Our experiments demonstrate that the proposed framework outperforms state-of-the-art methods. 展开更多
关键词 Information extraction Event extraction imitation learning Generative adversarial network
原文传递
Reinforcement learning building control approach harnessing imitation learning 被引量:3
10
作者 Sourav Dey Thibault Marzullo +1 位作者 Xiangyu Zhang Gregor Henze 《Energy and AI》 2023年第4期60-72,共13页
Reinforcement learning(RL)has shown significant success in sequential decision making in fields like autonomous vehicles,robotics,marketing and gaming industries.This success has attracted the attention to the RL cont... Reinforcement learning(RL)has shown significant success in sequential decision making in fields like autonomous vehicles,robotics,marketing and gaming industries.This success has attracted the attention to the RL control approach for building energy systems which are becoming complicated due to the need to optimize for multiple,potentially conflicting,goals like occupant comfort,energy use and grid interactivity.However,for real world applications,RL has several drawbacks like requiring large training data and time,and unstable control behavior during the early exploration process making it infeasible for an application directly to building control tasks.To address these issues,an imitation learning approach is utilized herein where the RL agents starts with a policy transferred from accepted rule based policies and heuristic policies.This approach is successful in reducing the training time,preventing the unstable early exploration behavior and improving upon an accepted rule-based policy-all of these make RL a more practical control approach for real world applications in the domain of building controls. 展开更多
关键词 Reinforcement learning Building controls imitation learning Artificial intelligence
原文传递
GACS:Generative Adversarial Imitation Learning Based on Control Sharing 被引量:1
11
作者 Huaiwei SI Guozhen TAN +1 位作者 Dongyu LI Yanfei PENG 《Journal of Systems Science and Information》 CSCD 2023年第1期78-93,共16页
Generative adversarial imitation learning(GAIL)directly imitates the behavior of experts from human demonstration instead of designing explicit reward signals like reinforcement learning.Meanwhile,GAIL overcomes the d... Generative adversarial imitation learning(GAIL)directly imitates the behavior of experts from human demonstration instead of designing explicit reward signals like reinforcement learning.Meanwhile,GAIL overcomes the defects of traditional imitation learning by using a generative adversary network framework and shows excellent performance in many fields.However,GAIL directly acts on immediate rewards,a feature that is reflected in the value function after a period of accumulation.Thus,when faced with complex practical problems,the learning efficiency of GAIL is often extremely low and the policy may be slow to learn.One way to solve this problem is to directly guide the action(policy)in the agents'learning process,such as the control sharing(CS)method.This paper combines reinforcement learning and imitation learning and proposes a novel GAIL framework called generative adversarial imitation learning based on control sharing policy(GACS).GACS learns model constraints from expert samples and uses adversarial networks to guide learning directly.The actions are produced by adversarial networks and are used to optimize the policy and effectively improve learning efficiency.Experiments in the autonomous driving environment and the real-time strategy game breakout show that GACS has better generalization capabilities,more efficient imitation of the behavior of experts,and can learn better policies relative to other frameworks. 展开更多
关键词 generative adversarial imitation learning reinforcement learning control sharing deep reinforcement learning
原文传递
Heterogeneous multi-player imitation learning
12
作者 Bosen Lian Wenqian Xue Frank L.Lewis 《Control Theory and Technology》 EI CSCD 2023年第3期281-291,共11页
This paper studies imitation learning in nonlinear multi-player game systems with heterogeneous control input dynamics.We propose a model-free data-driven inverse reinforcement learning(RL)algorithm for a leaner to fi... This paper studies imitation learning in nonlinear multi-player game systems with heterogeneous control input dynamics.We propose a model-free data-driven inverse reinforcement learning(RL)algorithm for a leaner to find the cost functions of a N-player Nash expert system given the expert's states and control inputs.This allows us to address the imitation learning problem without prior knowledge of the expert's system dynamics.To achieve this,we provide a basic model-based algorithm that is built upon RL and inverse optimal control.This serves as the foundation for our final model-free inverse RL algorithm which is implemented via neural network-based value function approximators.Theoretical analysis and simulation examples verify the methods. 展开更多
关键词 imitation learning Inverse reinforcement learning Heterogeneous multi-player games Data-driven model-free control
原文传递
Learning the optimal state-feedback via supervised imitation learning
13
作者 Dharmesh Tailor Dario Izzo 《Astrodynamics》 CSCD 2019年第4期361-374,共14页
Imitation learning is a control design paradigm that seeks to learn a control policy reproducing demonstrations from expert agents.By substituting expert demonstrations for optimal behaviours,the same paradigm leads t... Imitation learning is a control design paradigm that seeks to learn a control policy reproducing demonstrations from expert agents.By substituting expert demonstrations for optimal behaviours,the same paradigm leads to the design of control policies closely approximating the optimal state-feedback.This approach requires training a machine learning algorithm(in our case deep neural networks)directly on state-control pairs originating from optimal trajectories.We have shown in previous work that,when restricted to low-dimensional state and control spaces,this approach is very successful in several deterministic,non-linear problems in continuous-time.In this work,we refine our previous studies using as a test case a simple quadcopter model with quadratic and time-optimal objective functions.We describe in detail the best learning pipeline we have developed,that is able to approximate via deep neural networks the state-feedback map to a very high accuracy.We introduce the use of the softplus activation function in the hidden units of neural networks showing that it results in a smoother control profile whilst retaining the benefits of rectifiers.We show how to evaluate the optimality of the trained state-feedback,and find that already with two layers the objective function reached and its optimal value differ by less than one percent.We later consider also an additional metric linked to the system asymptotic behaviour-time taken to converge to the policy’s fixed point.With respect to these metrics,we show that improvements in the mean absolute error do not necessarily correspond to better policies. 展开更多
关键词 optimal control deep learning imitation learning G&CNET
原文传递
考虑行为克隆的深度强化学习股票交易策略 被引量:2
14
作者 杨兴雨 陈亮威 +1 位作者 郑萧腾 张永 《系统管理学报》 CSSCI CSCD 北大核心 2024年第1期150-161,共12页
为提高股票投资的收益并降低风险,将模仿学习中的行为克隆思想引入深度强化学习框架中设计股票交易策略。在策略设计过程中,将对决DQN深度强化学习算法和行为克隆进行结合,使智能体在自主探索的同时模仿事先构造的投资专家的决策。选择... 为提高股票投资的收益并降低风险,将模仿学习中的行为克隆思想引入深度强化学习框架中设计股票交易策略。在策略设计过程中,将对决DQN深度强化学习算法和行为克隆进行结合,使智能体在自主探索的同时模仿事先构造的投资专家的决策。选择不同行业的股票进行数值实验,说明了所设计的交易策略在年化收益率、夏普比率和卡玛比率等收益与风险指标上优于对比策略。研究结果表明:将模仿学习与深度强化学习相结合可以使智能体同时具有探索和模仿能力,从而提高模型的泛化能力和策略的适用性。 展开更多
关键词 股票交易策略 深度强化学习 模仿学习 行为克隆 对决深度Q学习网络
下载PDF
改进行为克隆与DDPG的无人驾驶决策模型 被引量:1
15
作者 李伟东 黄振柱 +2 位作者 何精武 马草原 葛程 《计算机工程与应用》 CSCD 北大核心 2024年第14期86-95,共10页
无人驾驶技术的关键是决策层根据感知环节输入信息做出准确指令。强化学习和模仿学习比传统规则更适用于复杂场景。但以行为克隆为代表的模仿学习存在复合误差问题,使用优先经验回放算法对行为克隆进行改进,提升模型对演示数据集的拟合... 无人驾驶技术的关键是决策层根据感知环节输入信息做出准确指令。强化学习和模仿学习比传统规则更适用于复杂场景。但以行为克隆为代表的模仿学习存在复合误差问题,使用优先经验回放算法对行为克隆进行改进,提升模型对演示数据集的拟合能力;原DDPG(deep deterministic policy gradient)算法存在探索效率低下问题,使用经验池分离以及随机网络蒸馏技术(random network distillation,RND)对DDPG算法进行改进,提升DDPG算法训练效率。使用改进后的算法进行联合训练,减少DDPG训练前期的无用探索。通过TORCS(the open racing car simulator)仿真平台验证,实验结果表明该方法在相同的训练次数内,能够探索出更稳定的道路保持、速度保持和避障能力。 展开更多
关键词 无人驾驶 强化学习 模仿学习 决策算法 TORCS
下载PDF
基元库构建思想的机器人动作与策略演示学习方法
16
作者 李铁军 刘家奇 +1 位作者 刘今越 贾晓辉 《计算机工程与应用》 CSCD 北大核心 2024年第8期90-98,共9页
为解决机器人演示学习过程中演示数据优化、动作与任务策略的存储调用问题,提出一种利用基元库思想的演示学习方法。动作学习采用专家拖动机械臂执行动作获取演示数据,利用高斯混合模型与高斯混合回归提升数据质量,由动态运动基元算法... 为解决机器人演示学习过程中演示数据优化、动作与任务策略的存储调用问题,提出一种利用基元库思想的演示学习方法。动作学习采用专家拖动机械臂执行动作获取演示数据,利用高斯混合模型与高斯混合回归提升数据质量,由动态运动基元算法转换为基函数的权重值。策略学习将任务步骤创建为动作基元,向基元内添加得到的权重值并构建包含任务执行策略的基元名片,由基元组成基元库完成存储。执行任务时从基元库中有序调用基元,利用YOLOv5目标检测网络和AlexNet图像分类网络检测目标信息,匹配动作并泛化出具有原动作特征的新动作。该方法实现了从演示中学习动作与策略存储,根据实际目标组合合适动作完成任务。钢筋绑扎实验创建5个动作基元,通过专家演示学习10个动作,机器人利用动作基元库成功完成水平面与竖直面钢筋交叉点绑扎任务说明其有效性。 展开更多
关键词 演示学习 轨迹模仿学习 任务策略学习 动态运动基元 运动基元库
下载PDF
作战方案驱动的可学习兵棋推演智能体研究
17
作者 孙怡峰 李智 +1 位作者 吴疆 王玉宾 《系统仿真学报》 CAS CSCD 北大核心 2024年第7期1525-1535,共11页
为了使智能体能够应对兵棋推演中的复杂作战场景和作战目的,提出作战方案驱动的可学习兵棋推演智能体架构。剖析智能体对兵棋系统的“依附特性”和“松耦合特性”,得到智能体的可学习要求;在智能体框架设计中,使用作战方案压减智能体学... 为了使智能体能够应对兵棋推演中的复杂作战场景和作战目的,提出作战方案驱动的可学习兵棋推演智能体架构。剖析智能体对兵棋系统的“依附特性”和“松耦合特性”,得到智能体的可学习要求;在智能体框架设计中,使用作战方案压减智能体学习范围。通过有限状态机对应作战方案中的作战阶段知识,依据作战方案框架确定智能体决策空间,设计可学习的深层神经网络实施关键决策空间探索,神经网络采用先验知识模仿学习模式和深度强化学习模式。该架构能迭代探索人类难以充分梳理清楚的多棋子最优部署和协作问题。 展开更多
关键词 兵棋推演 智能体 作战方案 深层神经网络 强化学习 模仿学习
下载PDF
基于多模态中间表示的端到端自动驾驶模型
18
作者 孔慧芳 刘润武 胡杰 《现代制造工程》 CSCD 北大核心 2024年第3期70-78,共9页
对驾驶环境的准确理解是实现自动驾驶的先决条件之一。为提高自动驾驶车辆的场景理解能力,提出了一种基于语义分割、水平视差和角度编码的多模态中间表示的端到端自动驾驶模型。该端到端自动驾驶模型利用深度学习技术构建感知-规划网络... 对驾驶环境的准确理解是实现自动驾驶的先决条件之一。为提高自动驾驶车辆的场景理解能力,提出了一种基于语义分割、水平视差和角度编码的多模态中间表示的端到端自动驾驶模型。该端到端自动驾驶模型利用深度学习技术构建感知-规划网络。感知网络以RGB和深度图为输入生成多模态中间表示,实现道路环境及周围障碍物的空间分布描述;规划网络使用多模态中间表示进行道路环境特征提取和航路点预测。基于CARLA仿真平台进行模型的训练和性能测试,结果表明:该端到端自动驾驶模型能够实现对城市道路环境的场景理解,有效地减少了碰撞;相较于单模态中间表示的基线模型,其驾驶性能指标提升了31.47%。 展开更多
关键词 自动驾驶 场景理解 模仿学习 轨迹规划
下载PDF
基于深度强化学习的无信号灯路口决策研究
19
作者 傅明建 郭福强 《计算机工程》 CAS CSCD 北大核心 2024年第5期91-99,共9页
无信号灯左转路口是自动驾驶场景中最为危险的场景之一,如何实现高效安全的左转决策是自动驾驶领域的重大难题。深度强化学习(DRL)算法在自动驾驶决策领域具有广阔应用前景。但是,深度强化学习在自动驾驶场景中存在样本效率低、奖励函... 无信号灯左转路口是自动驾驶场景中最为危险的场景之一,如何实现高效安全的左转决策是自动驾驶领域的重大难题。深度强化学习(DRL)算法在自动驾驶决策领域具有广阔应用前景。但是,深度强化学习在自动驾驶场景中存在样本效率低、奖励函数设计困难等问题。提出一种基于专家先验的深度强化学习算法(CBAMBC SAC)来解决上述问题。首先,利用SMARTS仿真平台获得专家先验知识;然后,使用通道-空间注意力机制(CBAM)改进行为克隆(BC)方法,在专家先验知识的基础上预训练模仿专家策略;最后,使用模仿专家策略指导深度强化学习算法的学习过程,并在无信号灯路口左转决策中进行验证。实验结果表明,基于专家先验的DRL算法比传统的DRL算法更具优势,不仅可以免去人为设置奖励函数的工作量,而且可以显著提高样本效率从而获得更优性能。在无信号灯路口左转场景下,CBAM-BC SAC算法与传统DRL算法(SAC)、基于传统行为克隆的DRL算法(BC SAC)相比,平均通行成功率分别提高了14.2和2.2个百分点。 展开更多
关键词 深度强化学习 自动驾驶 模仿学习 行为克隆 驾驶决策
下载PDF
基于路径模仿和SAC强化学习的机械臂路径规划算法 被引量:1
20
作者 宋紫阳 李军怀 +2 位作者 王怀军 苏鑫 于蕾 《计算机应用》 CSCD 北大核心 2024年第2期439-444,共6页
在机械臂路径规划算法的训练过程中,由于动作空间和状态空间巨大导致奖励稀疏,机械臂路径规划训练效率低,面对海量的状态数和动作数较难评估状态价值和动作价值。针对上述问题,提出一种基于SAC(Soft Actor-Critic)强化学习的机械臂路径... 在机械臂路径规划算法的训练过程中,由于动作空间和状态空间巨大导致奖励稀疏,机械臂路径规划训练效率低,面对海量的状态数和动作数较难评估状态价值和动作价值。针对上述问题,提出一种基于SAC(Soft Actor-Critic)强化学习的机械臂路径规划算法。通过将示教路径融入奖励函数使机械臂在强化学习过程中对示教路径进行模仿以提高学习效率,并采用SAC算法使机械臂路径规划算法的训练更快、稳定性更好。基于所提算法和深度确定性策略梯度(DDPG)算法分别规划10条路径,所提算法和DDPG算法规划的路径与参考路径的平均距离分别是0.8 cm和1.9 cm。实验结果表明,路径模仿机制能提高训练效率,所提算法比DDPG算法能更好地探索环境,使得规划路径更加合理。 展开更多
关键词 模仿学习 强化学习 SAC算法 路径规划 奖励函数
下载PDF
上一页 1 2 13 下一页 到第
使用帮助 返回顶部