Providing autonomous systems with an effective quantity and quality of information from a desired task is challenging. In particular, autonomous vehicles, must have a reliable vision of their workspace to robustly acc...Providing autonomous systems with an effective quantity and quality of information from a desired task is challenging. In particular, autonomous vehicles, must have a reliable vision of their workspace to robustly accomplish driving functions. Speaking of machine vision, deep learning techniques, and specifically convolutional neural networks, have been proven to be the state of the art technology in the field. As these networks typically involve millions of parameters and elements, designing an optimal architecture for deep learning structures is a difficult task which is globally under investigation by researchers. This study experimentally evaluates the impact of three major architectural properties of convolutional networks, including the number of layers, filters, and filter size on their performance. In this study, several models with different properties are developed,equally trained, and then applied to an autonomous car in a realistic simulation environment. A new ensemble approach is also proposed to calculate and update weights for the models regarding their mean squared error values. Based on design properties,performance results are reported and compared for further investigations. Surprisingly, the number of filters itself does not largely affect the performance efficiency. As a result, proper allocation of filters with different kernel sizes through the layers introduces a considerable improvement in the performance.Achievements of this study will provide the researchers with a clear clue and direction in designing optimal network architectures for deep learning purposes.展开更多
Here,the challenges of sample efficiency and navigation performance in deep rein-forcement learning for visual navigation are focused and a deep imitation reinforcement learning approach is proposed.Our contributions ...Here,the challenges of sample efficiency and navigation performance in deep rein-forcement learning for visual navigation are focused and a deep imitation reinforcement learning approach is proposed.Our contributions are mainly three folds:first,a frame-work combining imitation learning with deep reinforcement learning is presented,which enables a robot to learn a stable navigation policy faster in the target-driven navigation task.Second,the surrounding images is taken as the observation instead of sequential images,which can improve the navigation performance for more information.Moreover,a simple yet efficient template matching method is adopted to determine the stop action,making the system more practical.Simulation experiments in the AI-THOR environment show that the proposed approach outperforms previous end-to-end deep reinforcement learning approaches,which demonstrate the effectiveness and efficiency of our approach.展开更多
Mobile Edge Computing(MEC)is promising to alleviate the computation and storage burdens for terminals in wireless networks.The huge energy consumption of MEC servers challenges the establishment of smart cities and th...Mobile Edge Computing(MEC)is promising to alleviate the computation and storage burdens for terminals in wireless networks.The huge energy consumption of MEC servers challenges the establishment of smart cities and their service time powered by rechargeable batteries.In addition,Orthogonal Multiple Access(OMA)technique cannot utilize limited spectrum resources fully and efficiently.Therefore,Non-Orthogonal Multiple Access(NOMA)-based energy-efficient task scheduling among MEC servers for delay-constraint mobile applications is important,especially in highly-dynamic vehicular edge computing networks.The various movement patterns of vehicles lead to unbalanced offloading requirements and different load pressure for MEC servers.Self-Imitation Learning(SIL)-based Deep Reinforcement Learning(DRL)has emerged as a promising machine learning technique to break through obstacles in various research fields,especially in time-varying networks.In this paper,we first introduce related MEC technologies in vehicular networks.Then,we propose an energy-efficient approach for task scheduling in vehicular edge computing networks based on DRL,with the purpose of both guaranteeing the task latency requirement for multiple users and minimizing total energy consumption of MEC servers.Numerical results demonstrate that the proposed algorithm outperforms other methods.展开更多
This paper reports on a study on the effects of reading-writing integrated tasks on vocabulary learning and explored the differential roles of creative construction and non-creative construction in promoting lexical l...This paper reports on a study on the effects of reading-writing integrated tasks on vocabulary learning and explored the differential roles of creative construction and non-creative construction in promoting lexical learning. Participants were 90 first-year English majors, randomly assigned to two experimental groups(continuation and retelling) and one control group, with 30 students in each group. Results showed that the continuation group generated a substantial amount of creative construction and produced significantly more instances of creative imitation than the retelling group. The continuation group outperformed the retelling group for both receptive and productive vocabulary knowledge gain and retention, but differences were only significant in terms of productive vocabulary retention. Finally, productive vocabulary knowledge retention among the continuation group was significantly and positively correlated with creative imitation(meaning creation coupled with language imitation), but not with linguistic alignment per se. As productive vocabulary knowledge constitutes the learner ’s ability to use lexical knowledge to express ideas in dynamic contexts, the findings afforded evidence that creative imitation could be the answer to the fundamental issue of L2 learning(i.e., mapping static language onto dynamic idea expression). The pedagogical implications as well as future research directions are also discussed.展开更多
Gesture recognition is topical in computer science and aims at interpreting human gestures via mathematical algorithms. Among the numerous applications are physical rehabilitation and imitation games. In this work, we...Gesture recognition is topical in computer science and aims at interpreting human gestures via mathematical algorithms. Among the numerous applications are physical rehabilitation and imitation games. In this work, we suggest performing human gesture recognition within the context of a serious imitation game, which would aim at improving social interactions with teenagers with autism spectrum disorders. We use an artificial intelligence algorithm to detect the skeleton of the participant, then model the human pose space and describe an imitation learning method using a Gaussian Mixture Model in the Riemannian manifold.展开更多
Traditional expert-designed branching rules in branch-and-bound(B&B) are static, often failing to adapt to diverse and evolving problem instances. Crafting these rules is labor-intensive, and may not scale well wi...Traditional expert-designed branching rules in branch-and-bound(B&B) are static, often failing to adapt to diverse and evolving problem instances. Crafting these rules is labor-intensive, and may not scale well with complex problems.Given the frequent need to solve varied combinatorial optimization problems, leveraging statistical learning to auto-tune B&B algorithms for specific problem classes becomes attractive. This paper proposes a graph pointer network model to learn the branch rules. Graph features, global features and historical features are designated to represent the solver state. The graph neural network processes graph features, while the pointer mechanism assimilates the global and historical features to finally determine the variable on which to branch. The model is trained to imitate the expert strong branching rule by a tailored top-k Kullback-Leibler divergence loss function. Experiments on a series of benchmark problems demonstrate that the proposed approach significantly outperforms the widely used expert-designed branching rules. It also outperforms state-of-the-art machine-learning-based branch-and-bound methods in terms of solving speed and search tree size on all the test instances. In addition, the model can generalize to unseen instances and scale to larger instances.展开更多
The intermittency of renewable energy generation,variability of load demand,and stochasticity of market price bring about direct challenges to optimal energy management of microgrids.To cope with these different forms...The intermittency of renewable energy generation,variability of load demand,and stochasticity of market price bring about direct challenges to optimal energy management of microgrids.To cope with these different forms of operation uncertainties,an imitation learning based real-time decision-mak-ing solution for microgrid economic dispatch is proposed.In this solution,the optimal dispatch trajectories obtained by solving the optimal problem using historical deterministic operation patterns are demonstrated as the expert samples for imitation learning.To improve the generalization performance of imitation learning and the expressive ability of uncertain variables,a hybrid model combining the unsupervised and supervised learning is utilized.The denoising autoencoder based unsupervised learning model is adopted to enhance the feature extraction of operation patterns.Furthermore,the long short-term memory network based supervised learning model is used to efficiently characterize the mapping between the input space composed of the extracted operation patterns and system state variables and the output space composed of the optimal dispatch trajectories.The numerical simulation results demonstrate that under various operation uncertainties,the operation cost achieved by the proposed solution is close to the minimum theoretical value.Compared with the traditional model predictive control method and basic clone imitation learning method,the operation cost of the proposed solution is reduced by 6.3% and 2.8%,respectively,overa test period of three months.展开更多
We propose a new framework for entity and event extraction based on generative adversarial imitation learning-an inverse reinforcement learning method using a generative adversarial network(GAN).We assume that instanc...We propose a new framework for entity and event extraction based on generative adversarial imitation learning-an inverse reinforcement learning method using a generative adversarial network(GAN).We assume that instances and labels yield to various extents of difficulty and the gains and penalties(rewards)are expected to be diverse.We utilize discriminators to estimate proper rewards according to the difference between the labels committed by the ground-truth(expert)and the extractor(agent).Our experiments demonstrate that the proposed framework outperforms state-of-the-art methods.展开更多
Reinforcement learning(RL)has shown significant success in sequential decision making in fields like autonomous vehicles,robotics,marketing and gaming industries.This success has attracted the attention to the RL cont...Reinforcement learning(RL)has shown significant success in sequential decision making in fields like autonomous vehicles,robotics,marketing and gaming industries.This success has attracted the attention to the RL control approach for building energy systems which are becoming complicated due to the need to optimize for multiple,potentially conflicting,goals like occupant comfort,energy use and grid interactivity.However,for real world applications,RL has several drawbacks like requiring large training data and time,and unstable control behavior during the early exploration process making it infeasible for an application directly to building control tasks.To address these issues,an imitation learning approach is utilized herein where the RL agents starts with a policy transferred from accepted rule based policies and heuristic policies.This approach is successful in reducing the training time,preventing the unstable early exploration behavior and improving upon an accepted rule-based policy-all of these make RL a more practical control approach for real world applications in the domain of building controls.展开更多
Generative adversarial imitation learning(GAIL)directly imitates the behavior of experts from human demonstration instead of designing explicit reward signals like reinforcement learning.Meanwhile,GAIL overcomes the d...Generative adversarial imitation learning(GAIL)directly imitates the behavior of experts from human demonstration instead of designing explicit reward signals like reinforcement learning.Meanwhile,GAIL overcomes the defects of traditional imitation learning by using a generative adversary network framework and shows excellent performance in many fields.However,GAIL directly acts on immediate rewards,a feature that is reflected in the value function after a period of accumulation.Thus,when faced with complex practical problems,the learning efficiency of GAIL is often extremely low and the policy may be slow to learn.One way to solve this problem is to directly guide the action(policy)in the agents'learning process,such as the control sharing(CS)method.This paper combines reinforcement learning and imitation learning and proposes a novel GAIL framework called generative adversarial imitation learning based on control sharing policy(GACS).GACS learns model constraints from expert samples and uses adversarial networks to guide learning directly.The actions are produced by adversarial networks and are used to optimize the policy and effectively improve learning efficiency.Experiments in the autonomous driving environment and the real-time strategy game breakout show that GACS has better generalization capabilities,more efficient imitation of the behavior of experts,and can learn better policies relative to other frameworks.展开更多
This paper studies imitation learning in nonlinear multi-player game systems with heterogeneous control input dynamics.We propose a model-free data-driven inverse reinforcement learning(RL)algorithm for a leaner to fi...This paper studies imitation learning in nonlinear multi-player game systems with heterogeneous control input dynamics.We propose a model-free data-driven inverse reinforcement learning(RL)algorithm for a leaner to find the cost functions of a N-player Nash expert system given the expert's states and control inputs.This allows us to address the imitation learning problem without prior knowledge of the expert's system dynamics.To achieve this,we provide a basic model-based algorithm that is built upon RL and inverse optimal control.This serves as the foundation for our final model-free inverse RL algorithm which is implemented via neural network-based value function approximators.Theoretical analysis and simulation examples verify the methods.展开更多
Imitation learning is a control design paradigm that seeks to learn a control policy reproducing demonstrations from expert agents.By substituting expert demonstrations for optimal behaviours,the same paradigm leads t...Imitation learning is a control design paradigm that seeks to learn a control policy reproducing demonstrations from expert agents.By substituting expert demonstrations for optimal behaviours,the same paradigm leads to the design of control policies closely approximating the optimal state-feedback.This approach requires training a machine learning algorithm(in our case deep neural networks)directly on state-control pairs originating from optimal trajectories.We have shown in previous work that,when restricted to low-dimensional state and control spaces,this approach is very successful in several deterministic,non-linear problems in continuous-time.In this work,we refine our previous studies using as a test case a simple quadcopter model with quadratic and time-optimal objective functions.We describe in detail the best learning pipeline we have developed,that is able to approximate via deep neural networks the state-feedback map to a very high accuracy.We introduce the use of the softplus activation function in the hidden units of neural networks showing that it results in a smoother control profile whilst retaining the benefits of rectifiers.We show how to evaluate the optimality of the trained state-feedback,and find that already with two layers the objective function reached and its optimal value differ by less than one percent.We later consider also an additional metric linked to the system asymptotic behaviour-time taken to converge to the policy’s fixed point.With respect to these metrics,we show that improvements in the mean absolute error do not necessarily correspond to better policies.展开更多
无人驾驶技术的关键是决策层根据感知环节输入信息做出准确指令。强化学习和模仿学习比传统规则更适用于复杂场景。但以行为克隆为代表的模仿学习存在复合误差问题,使用优先经验回放算法对行为克隆进行改进,提升模型对演示数据集的拟合...无人驾驶技术的关键是决策层根据感知环节输入信息做出准确指令。强化学习和模仿学习比传统规则更适用于复杂场景。但以行为克隆为代表的模仿学习存在复合误差问题,使用优先经验回放算法对行为克隆进行改进,提升模型对演示数据集的拟合能力;原DDPG(deep deterministic policy gradient)算法存在探索效率低下问题,使用经验池分离以及随机网络蒸馏技术(random network distillation,RND)对DDPG算法进行改进,提升DDPG算法训练效率。使用改进后的算法进行联合训练,减少DDPG训练前期的无用探索。通过TORCS(the open racing car simulator)仿真平台验证,实验结果表明该方法在相同的训练次数内,能够探索出更稳定的道路保持、速度保持和避障能力。展开更多
文摘Providing autonomous systems with an effective quantity and quality of information from a desired task is challenging. In particular, autonomous vehicles, must have a reliable vision of their workspace to robustly accomplish driving functions. Speaking of machine vision, deep learning techniques, and specifically convolutional neural networks, have been proven to be the state of the art technology in the field. As these networks typically involve millions of parameters and elements, designing an optimal architecture for deep learning structures is a difficult task which is globally under investigation by researchers. This study experimentally evaluates the impact of three major architectural properties of convolutional networks, including the number of layers, filters, and filter size on their performance. In this study, several models with different properties are developed,equally trained, and then applied to an autonomous car in a realistic simulation environment. A new ensemble approach is also proposed to calculate and update weights for the models regarding their mean squared error values. Based on design properties,performance results are reported and compared for further investigations. Surprisingly, the number of filters itself does not largely affect the performance efficiency. As a result, proper allocation of filters with different kernel sizes through the layers introduces a considerable improvement in the performance.Achievements of this study will provide the researchers with a clear clue and direction in designing optimal network architectures for deep learning purposes.
基金National Natural Science Foundation of China,Grant/Award Numbers:61703418,61825305。
文摘Here,the challenges of sample efficiency and navigation performance in deep rein-forcement learning for visual navigation are focused and a deep imitation reinforcement learning approach is proposed.Our contributions are mainly three folds:first,a frame-work combining imitation learning with deep reinforcement learning is presented,which enables a robot to learn a stable navigation policy faster in the target-driven navigation task.Second,the surrounding images is taken as the observation instead of sequential images,which can improve the navigation performance for more information.Moreover,a simple yet efficient template matching method is adopted to determine the stop action,making the system more practical.Simulation experiments in the AI-THOR environment show that the proposed approach outperforms previous end-to-end deep reinforcement learning approaches,which demonstrate the effectiveness and efficiency of our approach.
基金supported in part by the National Natural Science Foundation of China under Grant 61971084 and Grant 62001073in part by the National Natural Science Foundation of Chongqing under Grant cstc2019jcyj-msxmX0208in part by the open research fund of National Mobile Communications Research Laboratory,Southeast University,under Grant 2020D05.
文摘Mobile Edge Computing(MEC)is promising to alleviate the computation and storage burdens for terminals in wireless networks.The huge energy consumption of MEC servers challenges the establishment of smart cities and their service time powered by rechargeable batteries.In addition,Orthogonal Multiple Access(OMA)technique cannot utilize limited spectrum resources fully and efficiently.Therefore,Non-Orthogonal Multiple Access(NOMA)-based energy-efficient task scheduling among MEC servers for delay-constraint mobile applications is important,especially in highly-dynamic vehicular edge computing networks.The various movement patterns of vehicles lead to unbalanced offloading requirements and different load pressure for MEC servers.Self-Imitation Learning(SIL)-based Deep Reinforcement Learning(DRL)has emerged as a promising machine learning technique to break through obstacles in various research fields,especially in time-varying networks.In this paper,we first introduce related MEC technologies in vehicular networks.Then,we propose an energy-efficient approach for task scheduling in vehicular edge computing networks based on DRL,with the purpose of both guaranteeing the task latency requirement for multiple users and minimizing total energy consumption of MEC servers.Numerical results demonstrate that the proposed algorithm outperforms other methods.
文摘This paper reports on a study on the effects of reading-writing integrated tasks on vocabulary learning and explored the differential roles of creative construction and non-creative construction in promoting lexical learning. Participants were 90 first-year English majors, randomly assigned to two experimental groups(continuation and retelling) and one control group, with 30 students in each group. Results showed that the continuation group generated a substantial amount of creative construction and produced significantly more instances of creative imitation than the retelling group. The continuation group outperformed the retelling group for both receptive and productive vocabulary knowledge gain and retention, but differences were only significant in terms of productive vocabulary retention. Finally, productive vocabulary knowledge retention among the continuation group was significantly and positively correlated with creative imitation(meaning creation coupled with language imitation), but not with linguistic alignment per se. As productive vocabulary knowledge constitutes the learner ’s ability to use lexical knowledge to express ideas in dynamic contexts, the findings afforded evidence that creative imitation could be the answer to the fundamental issue of L2 learning(i.e., mapping static language onto dynamic idea expression). The pedagogical implications as well as future research directions are also discussed.
文摘Gesture recognition is topical in computer science and aims at interpreting human gestures via mathematical algorithms. Among the numerous applications are physical rehabilitation and imitation games. In this work, we suggest performing human gesture recognition within the context of a serious imitation game, which would aim at improving social interactions with teenagers with autism spectrum disorders. We use an artificial intelligence algorithm to detect the skeleton of the participant, then model the human pose space and describe an imitation learning method using a Gaussian Mixture Model in the Riemannian manifold.
基金supported by the Open Project of Xiangjiang Laboratory (22XJ02003)Scientific Project of the National University of Defense Technology (NUDT)(ZK21-07, 23-ZZCX-JDZ-28)+1 种基金the National Science Fund for Outstanding Young Scholars (62122093)the National Natural Science Foundation of China (72071205)。
文摘Traditional expert-designed branching rules in branch-and-bound(B&B) are static, often failing to adapt to diverse and evolving problem instances. Crafting these rules is labor-intensive, and may not scale well with complex problems.Given the frequent need to solve varied combinatorial optimization problems, leveraging statistical learning to auto-tune B&B algorithms for specific problem classes becomes attractive. This paper proposes a graph pointer network model to learn the branch rules. Graph features, global features and historical features are designated to represent the solver state. The graph neural network processes graph features, while the pointer mechanism assimilates the global and historical features to finally determine the variable on which to branch. The model is trained to imitate the expert strong branching rule by a tailored top-k Kullback-Leibler divergence loss function. Experiments on a series of benchmark problems demonstrate that the proposed approach significantly outperforms the widely used expert-designed branching rules. It also outperforms state-of-the-art machine-learning-based branch-and-bound methods in terms of solving speed and search tree size on all the test instances. In addition, the model can generalize to unseen instances and scale to larger instances.
基金supported in part by the National Natural Science Foundation of China(No.52177119).
文摘The intermittency of renewable energy generation,variability of load demand,and stochasticity of market price bring about direct challenges to optimal energy management of microgrids.To cope with these different forms of operation uncertainties,an imitation learning based real-time decision-mak-ing solution for microgrid economic dispatch is proposed.In this solution,the optimal dispatch trajectories obtained by solving the optimal problem using historical deterministic operation patterns are demonstrated as the expert samples for imitation learning.To improve the generalization performance of imitation learning and the expressive ability of uncertain variables,a hybrid model combining the unsupervised and supervised learning is utilized.The denoising autoencoder based unsupervised learning model is adopted to enhance the feature extraction of operation patterns.Furthermore,the long short-term memory network based supervised learning model is used to efficiently characterize the mapping between the input space composed of the extracted operation patterns and system state variables and the output space composed of the optimal dispatch trajectories.The numerical simulation results demonstrate that under various operation uncertainties,the operation cost achieved by the proposed solution is close to the minimum theoretical value.Compared with the traditional model predictive control method and basic clone imitation learning method,the operation cost of the proposed solution is reduced by 6.3% and 2.8%,respectively,overa test period of three months.
文摘We propose a new framework for entity and event extraction based on generative adversarial imitation learning-an inverse reinforcement learning method using a generative adversarial network(GAN).We assume that instances and labels yield to various extents of difficulty and the gains and penalties(rewards)are expected to be diverse.We utilize discriminators to estimate proper rewards according to the difference between the labels committed by the ground-truth(expert)and the extractor(agent).Our experiments demonstrate that the proposed framework outperforms state-of-the-art methods.
基金This work was authored in part by the National Renewable Energy Laboratory,United States,operated by Alliance for Sustainable Energy,LLC,for the U.S.Department of Energy(DOE)under Contract No.DE-AC36-08GO28308.
文摘Reinforcement learning(RL)has shown significant success in sequential decision making in fields like autonomous vehicles,robotics,marketing and gaming industries.This success has attracted the attention to the RL control approach for building energy systems which are becoming complicated due to the need to optimize for multiple,potentially conflicting,goals like occupant comfort,energy use and grid interactivity.However,for real world applications,RL has several drawbacks like requiring large training data and time,and unstable control behavior during the early exploration process making it infeasible for an application directly to building control tasks.To address these issues,an imitation learning approach is utilized herein where the RL agents starts with a policy transferred from accepted rule based policies and heuristic policies.This approach is successful in reducing the training time,preventing the unstable early exploration behavior and improving upon an accepted rule-based policy-all of these make RL a more practical control approach for real world applications in the domain of building controls.
基金Supported in Part by the National Natural Science Foundation of China (U1808206)。
文摘Generative adversarial imitation learning(GAIL)directly imitates the behavior of experts from human demonstration instead of designing explicit reward signals like reinforcement learning.Meanwhile,GAIL overcomes the defects of traditional imitation learning by using a generative adversary network framework and shows excellent performance in many fields.However,GAIL directly acts on immediate rewards,a feature that is reflected in the value function after a period of accumulation.Thus,when faced with complex practical problems,the learning efficiency of GAIL is often extremely low and the policy may be slow to learn.One way to solve this problem is to directly guide the action(policy)in the agents'learning process,such as the control sharing(CS)method.This paper combines reinforcement learning and imitation learning and proposes a novel GAIL framework called generative adversarial imitation learning based on control sharing policy(GACS).GACS learns model constraints from expert samples and uses adversarial networks to guide learning directly.The actions are produced by adversarial networks and are used to optimize the policy and effectively improve learning efficiency.Experiments in the autonomous driving environment and the real-time strategy game breakout show that GACS has better generalization capabilities,more efficient imitation of the behavior of experts,and can learn better policies relative to other frameworks.
文摘This paper studies imitation learning in nonlinear multi-player game systems with heterogeneous control input dynamics.We propose a model-free data-driven inverse reinforcement learning(RL)algorithm for a leaner to find the cost functions of a N-player Nash expert system given the expert's states and control inputs.This allows us to address the imitation learning problem without prior knowledge of the expert's system dynamics.To achieve this,we provide a basic model-based algorithm that is built upon RL and inverse optimal control.This serves as the foundation for our final model-free inverse RL algorithm which is implemented via neural network-based value function approximators.Theoretical analysis and simulation examples verify the methods.
文摘Imitation learning is a control design paradigm that seeks to learn a control policy reproducing demonstrations from expert agents.By substituting expert demonstrations for optimal behaviours,the same paradigm leads to the design of control policies closely approximating the optimal state-feedback.This approach requires training a machine learning algorithm(in our case deep neural networks)directly on state-control pairs originating from optimal trajectories.We have shown in previous work that,when restricted to low-dimensional state and control spaces,this approach is very successful in several deterministic,non-linear problems in continuous-time.In this work,we refine our previous studies using as a test case a simple quadcopter model with quadratic and time-optimal objective functions.We describe in detail the best learning pipeline we have developed,that is able to approximate via deep neural networks the state-feedback map to a very high accuracy.We introduce the use of the softplus activation function in the hidden units of neural networks showing that it results in a smoother control profile whilst retaining the benefits of rectifiers.We show how to evaluate the optimality of the trained state-feedback,and find that already with two layers the objective function reached and its optimal value differ by less than one percent.We later consider also an additional metric linked to the system asymptotic behaviour-time taken to converge to the policy’s fixed point.With respect to these metrics,we show that improvements in the mean absolute error do not necessarily correspond to better policies.
文摘无人驾驶技术的关键是决策层根据感知环节输入信息做出准确指令。强化学习和模仿学习比传统规则更适用于复杂场景。但以行为克隆为代表的模仿学习存在复合误差问题,使用优先经验回放算法对行为克隆进行改进,提升模型对演示数据集的拟合能力;原DDPG(deep deterministic policy gradient)算法存在探索效率低下问题,使用经验池分离以及随机网络蒸馏技术(random network distillation,RND)对DDPG算法进行改进,提升DDPG算法训练效率。使用改进后的算法进行联合训练,减少DDPG训练前期的无用探索。通过TORCS(the open racing car simulator)仿真平台验证,实验结果表明该方法在相同的训练次数内,能够探索出更稳定的道路保持、速度保持和避障能力。