Feature-Based Aggregation and Deep Reinforcement Learning:A Survey and Some New Implementations 被引量：13

Feature-Based Aggregation and Deep Reinforcement Learning:A Survey and Some New Implementations

下载PDF

导出

摘要 In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement. In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement.

作者 Dimitri P.Bertsekas

机构地区 the Department of Electrical Engineering and Computer Science

出处《IEEE/CAA Journal of Automatica Sinica》 EI CSCD 2019年第1期1-31,共31页 自动化学报（英文版）

关键词 REINFORCEMENT learning dynamic programming Markovian decision problems AGGREGATION feature-based ARCHITECTURES policy ITERATION DEEP neural networks rollout algorithms Reinforcement learning dynamic programming Markovian decision problems aggregation feature-based architectures policy iteration deep neural networks rollout algorithms

分类号 TP18 [自动化与计算机技术—控制理论与控制工程] O225 [理学—运筹学与控制论]

引文网络
相关文献

同被引文献48

1Dimitri P.BERTSEKAS.Approximate policy iteration:a survey and somenew methods[J].控制理论与应用（英文版）,2011,9(3):310-335. 被引量：6
2张超.基于贪婪算法的遥感地面站任务调度技术[J].无线电工程,2011,41(1):58-60. 被引量：10
3邓翔宇,马义德.PCNN参数自适应设定及其模型的改进[J].电子学报,2012,40(5):955-964. 被引量：34
4张冠茂.多重复化高斯—勒让德积分公式及其应用[J].兰州大学学报（自然科学版）,2000,36(5):30-34. 被引量：6
5陈英武,姚锋,李菊芳,贺仁杰,邢立宁.求解多星任务规划问题的演化学习型蚁群算法[J].系统工程理论与实践,2013,33(3):791-801. 被引量：16
6李军,钟志农,胡卫东,陈浩.基于SWE的空天资源对地观测协同任务规划服务模型[J].国防科技大学学报,2013,35(3):108-113. 被引量：10
7DENG Xiangyu,MA Yide.PCNN Model Analysis and Its Automatic Parameters Determination in Image Segmentation and Edge Detection[J].Chinese Journal of Electronics,2014,23(1):97-103. 被引量：20
8柳杨.航天信息在战术终端上的综合应用探讨[J].电光系统,2016,0(1):24-27. 被引量：1
9Wan Zhang,Min-Ping Jia,Lin Zhu,Xiao-An Yan.Comprehensive Overview on Computational Intelligence Techniques for Machinery Condition Monitoring and Fault Diagnosis[J].Chinese Journal of Mechanical Engineering,2017,30(4):782-795. 被引量：15
10楼建强,李俊峰,戴文战.非下采样剪切波变换的医学图像融合[J].中国图象图形学报,2017,22(11):1574-1583. 被引量：18

引证文献13

1王港,帅通,陈金勇,高峰.基于深度强化学习的航天信息综合应用与决策研究[J].无线电工程,2019,49(7):564-570. 被引量：5
2Ali Forootani,Raffaele Iervolino,Massimo Tipaldi,Joshua Neilson.Approximate Dynamic Programming for Stochastic Resource Allocation Problems[J].IEEE/CAA Journal of Automatica Sinica,2020,7(4):975-990. 被引量：4
3Mohammadhossein Ghahramani,Yan Qiao,Meng Chu Zhou,Adrian O’Hagan,James Sweeney.AI-Based Modeling and Data-Driven Evaluation for Smart Manufacturing Processes[J].IEEE/CAA Journal of Automatica Sinica,2020,7(4):1026-1037. 被引量：14
4李建昊,陈征.基于深度神经网络的哈希算法研究综述[J].无线通信技术,2020,29(2):45-50. 被引量：2
5马堉银,郑万波,马勇,刘航,夏云霓,郭坤银,陈鹏,刘诚武.一种基于深度强化学习与概率性能感知的边缘计算环境多工作流卸载方法[J].计算机科学,2021,48(1):40-48. 被引量：5
6Di Wu,Xin Luo.Robust Latent Factor Analysis for Precise Representation of High-Dimensional and Sparse Data[J].IEEE/CAA Journal of Automatica Sinica,2021,8(4):796-805. 被引量：3
7Dimitri Bertsekas.Multiagent Reinforcement Learning:Rollout and Policy Iteration[J].IEEE/CAA Journal of Automatica Sinica,2021,8(2):249-272. 被引量：1
8黄陈建,戴文战.基于区域像素差绝对值总和的NSST-PCNN医学图像融合[J].光电子．激光,2021,32(6):587-594. 被引量：1
9Yuxiang Yang,Zhihao Ni,Mingyu Gao,Jing Zhang,Dacheng Tao.Collaborative Pushing and Grasping of Tightly Stacked Objects via Deep Reinforcement Learning[J].IEEE/CAA Journal of Automatica Sinica,2022,9(1):135-145. 被引量：2
10张子豪,王云霞,王祖进,陈健飞.基于GBDT的城轨塞拉门早期机械故障诊断[J].南京工程学院学报（自然科学版）,2022,20(1):32-36.

二级引证文献42

1曹博,吕明家,汪帅,赵波,李青怡,刘光伟.不规则境界露天矿剥离物动态规划研究[J].辽宁工程技术大学学报（自然科学版）,2023(4):427-437.
2孙一凫,吕浩宇,陈毅兴,任晓欣,吴若飒,沈启.基于EnergyPlus-Python联合模拟和强化学习算法的室内环境控制优化[J].建设科技,2019,0(24):52-58. 被引量：4
3Shiming Liu,Yifan Xia,Zhusheng Shi,Hui Yu,Zhiqiang Li,Jianguo Lin.Deep Learning in Sheet Metal Bending With a Novel Theory-Guided Deep Neural Network[J].IEEE/CAA Journal of Automatica Sinica,2021,8(3):565-581. 被引量：6
4Othmane Friha,Mohamed Amine Ferrag,Lei Shu,Leandros Maglaras,Xiaochan Wang.Internet of Things for the Future of Smart Agriculture: A Comprehensive Survey of Emerging Technologies[J].IEEE/CAA Journal of Automatica Sinica,2021,8(4):718-752. 被引量：17
5Qiyue Wang,Wenhua Jiao,Peng Wang,YuMing Zhang.Digital Twin for Human-Robot Interactive Welding and Welder Behavior Analysis[J].IEEE/CAA Journal of Automatica Sinica,2021,8(2):334-343. 被引量：9
6Xiwang Guo,MengChu Zhou,Abdullah Abusorrah,Fahad Alsokhiry,Khaled Sedraoui.Disassembly Sequence Planning:A Survey[J].IEEE/CAA Journal of Automatica Sinica,2021,8(7):1308-1324. 被引量：1
7张壮领,陈彩娜,毕明利.基于ARM+FPGA方案的便携式智能勘灾设备的设计[J].工业仪表与自动化装置,2021(3):55-60. 被引量：1
8桑健,周婷,金彦亮.D2D通信中信道分配的智能优化算法研究[J].工业控制计算机,2021,34(7):117-119. 被引量：2
9Majid Mazouchi,Subramanya Nageshrao,Hamidreza Modares.Conflict-Aware Safe Reinforcement Learning:A Meta-Cognitive Learning Framework[J].IEEE/CAA Journal of Automatica Sinica,2022,9(3):466-481. 被引量：1
10袁浩,刘紫燕,梁静,梁水波,孙昊堃.融合LSTM的深度强化学习视觉导航[J].无线电工程,2022,52(1):161-167. 被引量：5

1Anwar Zeb,Vedat Suat Erturk,Umar Khan,Gul Zaman,Shaher Momani.An approach for approximate solution of fractional-order smoking model with relapse class[J].International Journal of Biomathematics,2018,11(6):31-57.
2马建斌,滕桂法.基于遗传规划算法的特征构建方法研究[J].河北农业大学学报,2018,41(5):130-136. 被引量：1
3吕晶,李鹏波,Katia Balassiano,吴军.城市公园评价的研究(英文)[J].Journal of Landscape Research,2011,3(8):82-85. 被引量：1
4LIAN Peiyuan,WANG Congsi,XIANG Binbin,SHI Yu,XUE Song.Gradient-based optimization method for producing a contoured beam with single-fed reflector antenna[J].Journal of Systems Engineering and Electronics,2019,30(1):22-29.
5Yongxia Hao,Chongjun Li,Renhong Wang.Sparse approximate solution of fitting surface to scattered points by MLASSO model[J].Science China Mathematics,2018,61(7):1319-1336. 被引量：2
6Bingmao Deng,Mingliang Fang,Yuefei Wang.Periodic points and normal families concerning multiplicity[J].Science China Mathematics,2019,62(3):535-552. 被引量：1
7Shuping He,Jun Song.Finite-time Sliding Mode Control Design for a Class of Uncertain Conic Nonlinear Systems[J].IEEE/CAA Journal of Automatica Sinica,2017,4(4):809-816. 被引量：3
8毕胭.从场地分析到特色营建(英文)[J].Journal of Landscape Research,2011,3(9):32-36.
9Yong Ao,Jiaqi Wang,Wenming Zou.On the existence and regularity of vector solutions for quasilinear systems with linear coupling[J].Science China Mathematics,2019,62(1):125-146.
10Wendy Saunders,Emily Grace,James Beban,David Johnston.Evaluating Land Use and Emergency Management Plans for Natural Hazards as a Function of Good Governance:A Case Study from New Zealand[J].International Journal of Disaster Risk Science,2015,6(1):62-74.

IEEE/CAA Journal of Automatica Sinica

2019年第1期

浏览历史

内容加载中请稍等...

Feature-Based Aggregation and Deep Reinforcement Learning:A Survey and Some New Implementations 被引量：13

同被引文献48

引证文献13

二级引证文献42

相关作者

相关机构

相关主题

浏览历史