摘要
深度强化学习算法已成功应用于一系列具有挑战性的任务,然而这些方法通常会遇到奖励稀疏的时间信用分配、缺乏有效的探索以及探索经验不足等问题。演化算法是一类受自然进化启发的黑盒优化技术,算法提出了改进的混沌遗传算法以及量子遗传算法分别与强化学习算法结合,首先创建用于进化计算演员网络的总体,并使用梯度下降来更新网络参数,进化种群中的网络,直至算法收敛。算法的适应度度量整合强化学习中事件的回报,一定程度上解决了稀疏奖励条件下的时间信用分配问题;利用种群的方法来生成各种经验训练RL智能体,提高了鲁棒性。在离散和连续的强化学习环境中做了对比实验和消融实验,实验证明本文的算法能收敛到更高的奖励值,且能提高收敛速度。Deep reinforcement learning algorithms have been successfully applied to a range of challenging tasks;however, these methods often encounter problems such as sparse reward time credit allocation, lack of effective exploration, and insufficient exploration experience. Evolutionary algorithm is a type of black box optimization technique inspired by natural evolution. Improved chaotic genetic algorithm and quantum genetic algorithm are proposed to be combined with reinforcement learning algorithm. The algorithm first creates a population for evolutionary computation of actor networks and uses gradient descent to update network parameters, evolving the network in the population until the algorithm converges. The fitness measurement of the algorithm integrates the reward of events in reinforcement learning, which to some extent solves the problem of time credit allocation under sparse reward conditions;The use of population methods to generate various experience trained RL agents has improved robustness. Comparative experiments and ablation experiments were conducted in both discrete and continuous reinforcement learning environments, demonstrating that our algorithm can converge to higher reward values and improve convergence speed.
出处
《计算机科学与应用》
2024年第10期102-109,共8页
Computer Science and Application