改进遗传算法进化的演员网络种群强化学习算法

Evolving Actor Network Population Algorithm by Improved Genetic in Reinforcement Learning

下载PDF

导出

摘要深度强化学习算法已成功应用于一系列具有挑战性的任务,然而这些方法通常会遇到奖励稀疏的时间信用分配、缺乏有效的探索以及探索经验不足等问题。演化算法是一类受自然进化启发的黑盒优化技术,算法提出了改进的混沌遗传算法以及量子遗传算法分别与强化学习算法结合,首先创建用于进化计算演员网络的总体,并使用梯度下降来更新网络参数,进化种群中的网络,直至算法收敛。算法的适应度度量整合强化学习中事件的回报,一定程度上解决了稀疏奖励条件下的时间信用分配问题;利用种群的方法来生成各种经验训练RL智能体,提高了鲁棒性。在离散和连续的强化学习环境中做了对比实验和消融实验,实验证明本文的算法能收敛到更高的奖励值,且能提高收敛速度。Deep reinforcement learning algorithms have been successfully applied to a range of challenging tasks;however, these methods often encounter problems such as sparse reward time credit allocation, lack of effective exploration, and insufficient exploration experience. Evolutionary algorithm is a type of black box optimization technique inspired by natural evolution. Improved chaotic genetic algorithm and quantum genetic algorithm are proposed to be combined with reinforcement learning algorithm. The algorithm first creates a population for evolutionary computation of actor networks and uses gradient descent to update network parameters, evolving the network in the population until the algorithm converges. The fitness measurement of the algorithm integrates the reward of events in reinforcement learning, which to some extent solves the problem of time credit allocation under sparse reward conditions;The use of population methods to generate various experience trained RL agents has improved robustness. Comparative experiments and ablation experiments were conducted in both discrete and continuous reinforcement learning environments, demonstrating that our algorithm can converge to higher reward values and improve convergence speed.

作者张圣涛赵佳陈楚琪

机构地区河北工业大学理学院

出处《计算机科学与应用》 2024年第10期102-109,共8页 Computer Science and Application

关键词遗传算法强化学习演员网络稀疏奖励

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

参考文献6

1冯敏,章少辉,白美健,张宝忠.基于基因表达式编程算法的数据驱动型平板闸门流量计算方法[J].节水灌溉,2024(7):88-94. 被引量：1
2陆涛,管荑,贾鹏,曲志坚,王子灵.基于种群混合迁移策略的并行量子遗传算法[J].计算机工程与设计,2024,45(8):2386-2392. 被引量：1
3蒋林利.量子遗传算法研究现状综述[J].广西科技师范学院学报,2016,31(2):130-134. 被引量：7
4尹帅,余建慧,宋斌,郭延宁,李传江,吕跃勇.基于多种群混沌遗传算法的GEO目标服务任务规划[J].系统工程与电子技术,2024,46(3):914-921. 被引量：1
5郭洪飞,陆鑫宇,任亚平,张超勇,李建庆.基于强化学习的群体进化算法求解双边多目标同步并行拆解线平衡问题[J].机械工程学报,2023,59(7):355-366. 被引量：1
6程浩鹏,朱涵,杨高奇,晏为民,王慧婷.深度强化学习及智能路径规划应用综述[J].现代计算机,2022,28(21):1-10. 被引量：2

二级参考文献68

1崔乃刚,王平,郭继峰,程兴.空间在轨服务技术发展综述[J].宇航学报,2007,28(4):805-811. 被引量：166
2周传华,钱锋.改进量子遗传算法及其应用[J].计算机应用,2008,28(2):286-288. 被引量：33
3Narayanan A,MOORE M.Quantum-inspired genetic algorithm[C]//Proc of IEEE Internation on Conference on Congress onEvolutionaryComputation.1996:61-66.
4Han K H,Kim J H.Genetic quantum algorithm and its applicationto combinatorial optimization problem[C]// Proc of IEEECongress on Evolutionary Computation,2000: 1354-1360.
5Gao Lin,Gu Xingsheng.A Novel Real-coded Quantum-inspiredGenetic Algorithm and Its Application in Data Reconciliation[J].International Journal of computational intelligence systems,2012,5(3):413-420.
6Sun Y,Xiong H G. Real Coded Quantum Genetic Algorithm and itsApplication[J].Journul of Engineering Science and TechnologyReview,2013,6(5):25-32.
7Liu J,Wang H,Sun Y.Real-Coded Quantum-Inspired GeneticAlgorithm-Based BP Neural Network Algorithm[J]. MathematicalProblems in Engineering,2015,(1): 1-10.
8Lei G,Yin X,Shi W.Research on Network Congestion ControlBased on Quantum Genetic Algorithm[J].Applied Mechanics &Materials,2014,513(2):845-849.
9Lv H.A novel Quantum Genetic Algorithm in TSP[J].AppliedMechanics & Materials, 2014,519(8):759-763.
10Mousa A A,Elattar E E.Best Compromise Alternative to EELDProblem using Hybrid Multiobjective Quantum Genetic Algorithm[J].Applied mathematics & information sciences,2014,8(6):2889-2902.

共引文献7

1陈晓敏,王家伟.基于混合粒子群算法的列车停站方案优化[J].计算机系统应用,2018,27(6):12-17.
2王志皓,隋国晖.基于人工蜂群算法的管道自动布局优化研究[J].数字技术与应用,2019,37(10):119-120.
3季军亮,汪民乐,商长安,高嘉乐.一种改进量子遗传算法在地空反辐射混编群兵力配置优化中的应用[J].西北工业大学学报,2019,37(5):992-999. 被引量：4
4王海龙,刘丽峰.量子遗传算法对砂类边坡稳定性敏感度的分析[J].山东理工大学学报（自然科学版）,2022,36(5):70-74. 被引量：5
5李闪,王新宇,麻志强,卫景宠,田杰.基于量子遗传算法的火力分配[J].火力与指挥控制,2023,48(5):53-57. 被引量：1
6金宇杰,龚堰珏,赵罘.基于模拟退火量子遗传算法的焊接机器人轨迹规划[J].现代制造工程,2024(1):33-38.
7于军琪,陈易圣,冯春勇,苏煜聪,郭聚刚.智能建造机器人局部路径规划研究综述[J].计算机工程与应用,2024,60(10):16-29.

1徐鹏飞,谭维馥.汽车夹具设计与装配工艺优化技术[J].车迷,2024(3):0124-0126.
2Hui Song,Chen Liu,Ali Moradi Amani,Mingchen Gu,Mahdi Jalili,Lasantha Meegahapola,Xinghuo Yu,George Dickeson.Smart optimization in battery energy storage systems:An overview[J].Energy and AI,2024,17(3):525-541.
3李荟梅.论感知能力对初等教育的重要性[J].小说月刊（下半月）,2024(18):0185-0187.
4曾亮,向思颖,曾维钧,王嘉诚,王珊珊,李维刚.基于改进角度惩罚距离和自适应参考向量的高维多目标进化算法[J].控制与决策,2024,39(10):3199-3206.
5谢承旺,付世炜.MaOEA/A2R:一种基于A2R支配关系的高维多目标进化算法[J].电子学报,2024,52(8):2758-2772.
6曹梦川,伍丹,杜朋轩.随机梯度下降与批量梯度下降在枸杞生长模型优化中收敛速度的对比分析[J].现代农机,2024(6):75-77.
7无.巴彦淖尔市:以“奖”促干全力推进黄河“几字弯”攻坚战[J].国土绿化,2024(9):13-13.
8李娜.完善保护体系护佑候鸟迁飞[J].中国林业,2024(10):76-81.
9Wenjun KE,Ziyu SHANG,Zhizhao LUO,Peng WANG,Yikai GUO,Qi LIU,Yuxuan CHEN.Unveiling factuality and injecting knowledge for LLMs via reinforcement learning and data proportion[J].Science China(Information Sciences),2024,67(10):385-386.
10张宇.基于人工智能的中波广播信号识别技术研究[J].电视技术,2024,48(10):144-146.

计算机科学与应用

2024年第10期

浏览历史

内容加载中请稍等...

改进遗传算法进化的演员网络种群强化学习算法

参考文献6

二级参考文献68

共引文献7

相关作者

相关机构

相关主题

浏览历史