Derivative-free reinforcement learning:a review 被引量：4

导出

摘要 Reinforcement learning is about learning agent models that make the best sequential decisions in unknown environments.In an unknown environment,the agent needs to explore the environment while exploiting the collected information,which usually forms a sophisticated problem to solve.Derivative-free optimization,meanwhile,is capable of solving sophisticated problems.It commonly uses a sampling-andupdating framework to iteratively improve the solution,where exploration and exploitation are also needed to be well balanced.Therefore,derivative-free optimization deals with a similar core issue as reinforcement learning,and has been introduced in reinforcement learning approaches,under the names of learning classifier systems and neuroevolution/evolutionary reinforcement learning.Although such methods have been developed for decades,recently,derivative-free reinforcement learning exhibits attracting increasing attention.However,recent survey on this topic is still lacking.In this article,we summarize methods of derivative-free reinforcement learning to date,and organize the methods in aspects including parameter updating,model selection,exploration,and parallel/distributed methods.Moreover,we discuss some current limitations and possible future directions,hoping that this article could bring more attentions to this topic and serve as a catalyst for developing novel and efficient approaches.

作者 Hong QIAN Yang YU

机构地区 National Key Laboratory for Novel Software Technology

出处《Frontiers of Computer Science》 SCIE EI CSCD 2021年第6期75-93,共19页 中国计算机科学前沿（英文版）

基金 This work was supported by the Program A for Outstanding PhD Candidate of Nanjing University,National Science Foundation of China(61876077) Jiangsu Science Foundation(BK20170013) Collaborative Innovation Center of Novel Software Technology and Industrialization.

关键词 reinforcement learning derivative-free optimization neuroevolution reinforcement learning neural architecture search

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

同被引文献28

1Jianjun YANG,Yunhai TONG,Zitian WANG,Shaohua TAN.Efficient and effective Bayesian network local structure learning[J].Frontiers of Computer Science,2014,8(4):527-536. 被引量：1
2辛向阳.交互设计:从物理逻辑到行为逻辑[J].装饰,2015(1):58-62. 被引量：467
3吴永和,刘博文,马晓玲.构筑“人工智能+教育”的生态系统[J].远程教育杂志,2017,35(5):27-39. 被引量：282
4袁毅,刘岩妍.“互联网+”时代品牌设计的特征及传播策略研究[J].包装工程,2019,40(2):82-86. 被引量：15
5胡洁.人工智能驱动的艺术创新[J].装饰,2019(11):12-17. 被引量：22
6孙妍彦,李士岩,陈宪涛.情感化语音交互设计——百度AI用户体验部门人机交互研究地图与设计案例[J].装饰,2019(11):22-27. 被引量：25
7王淼,魏勤文.皮革材料在艺术与设计行业的跨领域研究[J].中国皮革,2020,49(4):14-17. 被引量：17
8吕越,谷玮,包群.人工智能与中国企业参与全球价值链分工[J].中国工业经济,2020,0(5):80-98. 被引量：251
9兰玉琪,刘松洋.人工智能技术下的产品用户体验研究综述[J].包装工程,2020,41(24):22-29. 被引量：24
10卜俊,孙培贤,唐刚,周涛.敦煌藻井纹样在现代女式手提包中的创新设计与应用[J].皮革科学与工程,2021,31(3):71-76. 被引量：30

引证文献4

1Ting WU,Hong QIAN,Ziqi LIU,Jun ZHOU,Aimin ZHOU.Bi-objective evolutionary Bayesian network structure learning via skeleton constraint[J].Frontiers of Computer Science,2023,17(6):111-123.
2Peng YANG,Laoming ZHANG,Haifeng LIU,Guiying LI.Reducing idleness in financial cloud services via multi-objective evolutionary reinforcement learning based load balancer[J].Science China(Information Sciences),2024,67(2):16-36. 被引量：1
3曾钦宇,孟凯宁,王媛麟.人工智能语境下的皮革制品设计路径研究[J].皮革科学与工程,2024,34(2):91-98. 被引量：9
4Chengxing JIA,Fuxiang ZHANG,Tian XU,Jing-Cheng PANG,Zongzhang ZHANG,Yang YU.Model gradient: unified model and policy learning in model-based reinforcement learning[J].Frontiers of Computer Science,2024,18(4):117-128.

二级引证文献10

1丁宇诚,冯艺淞,吴雨航,李浩,龙雨,程子萱,颜璐.人工智能与虚拟现实技术在汉服产业中的价值初探[J].西部皮革,2024,46(9):34-36.
2秦瑞雪.艺术与科技融合背景下服装设计的表现研究[J].西部皮革,2024,46(9):59-61. 被引量：1
3黄喆,王选政,李杰.人工智能与生成:计算机介入产品设计[J].设计,2024,37(9):77-80. 被引量：1
4Darakhshan Syed,Ghulam Muhammad,Safdar Rizvi.Systematic Review:Load Balancing in Cloud Computing by Using Metaheuristic Based Dynamic Algorithms[J].Intelligent Automation & Soft Computing,2024,39(3):437-476.
5缪卓亚,丁玮.AI生成服饰款式设计的应用探索[J].服装设计师,2024(9):87-93.
6赖朝坤,林显河,吴文霞,谈进球,陈健伟,吴思思,李支薇.AI赋能纺织服装设计与制造的研究[J].西部皮革,2024,46(17):15-17.
7袁玺.AIGC在高职服装专业混合式教学中的实施与优化研究[J].西部皮革,2024,46(17):52-54.
8邓云.生成式人工智能时代的服装设计[J].西部皮革,2024,46(18):12-14.
9鲍艳,郭茹月,马建中.未来皮革行业发展趋势及展望[J].中国皮革,2024,53(11):33-41.
10孙宁,孟凯宁,曾钦宇.基于AI技术的手工皮具社区共创服务设计[J].皮革科学与工程,2024,34(6):104-111.

1Tugal Zhanlav,Khuder Otgondorj,Renchin-Ochir Mijiddorj.Constructive Theory of Designing Optimal Eighth-Order Derivative-Free Methods for Solving Nonlinear Equations[J].American Journal of Computational Mathematics,2020,10(1):100-117.
2T.-R.Xiang,X.I.A.Yang,Y.-P.Shi.Neuroevolution-enabled adaptation of the Jacobi method for Poisson’s equation with density discontinuities[J].Theoretical & Applied Mechanics Letters,2021,11(3):172-179.
3Akshay Agrawal,Shane Barratt,Stephen Boyd.Learning Convex Optimization Models[J].IEEE/CAA Journal of Automatica Sinica,2021,8(8):1355-1364. 被引量：5
4Tran Doan Huan,Rohit Batra,James Chapman,Sridevi Krishnan,Lihua Chen,Rampi Ramprasad.A universal strategy for the creation of machine learning-based atomistic force fields[J].npj Computational Materials,2017(1):146-153. 被引量：12
5Abhinoy Kumar Singh.Major Development Under Gaussian Filtering Since Unscented Kalman Filter[J].IEEE/CAA Journal of Automatica Sinica,2020,7(5):1308-1325. 被引量：7
6Rossella Aversa,Piero Coronica,Cristiano De Nobili,Stefano Cozzini.Deep Learning,Feature Learning,and Clustering Analysis for SEM Image Classification[J].Data Intelligence,2020,2(4):513-528.
7Joowon Lim,Ahmed B.Ayoub,Elizabeth E.Antoine,Demetri Psaltis.High-fidelity optical diffraction tomography of multiple scattering samples[J].Light(Science & Applications),2019,8(1):454-465. 被引量：10
8Syed Musab Ahmed,Guoquan Suo,Wei Alex Wang,Kai Xi,Saad Bin Iqbal.Improvement in potassium ion batteries electrodes: Recent developments and efficient approaches[J].Journal of Energy Chemistry,2021,30(11):307-337.
9Bryan Fong.Analysing the behavioural finance impact of ‘fake news’phenomena on financial markets:a representative agent model and empirical validation[J].Financial Innovation,2021,7(1):1169-1198. 被引量：1
10Hemin Yuan,Yun Wang,Xiangchun Wang.Seismic Methods for Exploration and Exploitation of Gas Hydrate[J].Journal of Earth Science,2021,32(4):839-849. 被引量：4

Frontiers of Computer Science

2021年第6期

浏览历史

内容加载中请稍等...

Derivative-free reinforcement learning:a review 被引量：4

同被引文献28

引证文献4

二级引证文献10

相关作者

相关机构

相关主题

浏览历史