小样本场景下的强化学习研究综述

Review of Research on Reinforcement Learning in Few-Shot Scenes

下载PDF

导出

摘要根据小样本问题背景,将小样本场景分成两类,第一类场景追求更专业的性能,第二类场景追求更通用的性能.一般在知识泛化过程中,不同的场景对知识载体的需求有着明显的倾向性.针对小样本学习方法,以知识载体的角度,将其分为使用过程性知识的方法和使用陈述性知识的方法,再讨论该分类下的小样本强化学习算法.最后,从理论和应用等方面提出了可能的发展方向,以期为后续研究提供参考. According to the background of the few-shot problem, this paper divides few-shot scenes into two types. The first type of scenes pursues more professional performance, while the other pursues more general performance. In the process of knowledge generalization, different scenes have obvious tendency to the requirement of knowledge carrier. Because of the discovery, the FSL is divided into two types in terms of knowledge carrier, where one type uses procedural knowledge and the other uses declarative knowledge. Then FS-RL algorithms under this classification are discussed. Finally, the possible development direction is proposed from the theory and the application, hoping to provide insights to following research.

作者王哲超傅启明陈建平胡伏原陆悠吴宏杰 Wang Zhechao;Fu Qiming;Chen Jianping;Hu Fuyuan;Lu You;Wu Hongjie(School of Electronic and Information Engineering,Suzhou University of Science and Technology,Suzhou 215009,China;Jiangsu Provincial Key Laboratory of Building Intelligence and Energy Saving,Suzhou University of Science and Technology,Suzhou 215009,China;Suzhou Key Laboratory of Mobile Networking and Applied Technologies,Suzhou University of Science and Technology,Suzhou 215009,China)

机构地区苏州科技大学电子与信息工程学院苏州科技大学江苏省建筑智慧节能重点实验室苏州科技大学苏州市移动网络技术与应用重点实验室

出处《南京师范大学学报（工程技术版）》 CAS 2022年第1期86-92,共7页 Journal of Nanjing Normal University(Engineering and Technology Edition)

基金国家重点研发计划项目(2020YFC2006602) 国家自然科学基金项目(62072324、61876217、61876121、61772357、62073231、61902272) 江苏省重点研发计划项目(BE2017663)。

关键词强化学习小样本学习元学习迁移学习终身学习知识泛化 reinforcement learning few-shot learning meta-learning transfer learning lifelong learning knowledge generalization

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献5

1吉珊珊.基于神经网络树和人工蜂群优化的数据聚类[J].南京师大学报（自然科学版）,2021,44(1):119-127. 被引量：5
2王皓,高阳,陈兴国.强化学习中的迁移:方法和进展[J].电子学报,2008,36(B12):39-43. 被引量：27
3Dimitri P.BERTSEKAS.Approximate policy iteration:a survey and somenew methods[J].控制理论与应用（英文版）,2011,9(3):310-335. 被引量：6
4施伟,冯旸赫,程光权,黄红蓝,黄金才,刘忠,贺威.基于深度强化学习的多机协同空战方法研究[J].自动化学报,2021,47(7):1610-1623. 被引量：52
5孟琭,沈凝,祁殷俏,张昊园.基于强化学习的三维游戏控制算法[J].东北大学学报（自然科学版）,2021,42(4):478-482. 被引量：4

二级参考文献135

1Anderson J R. Cognitive Psychology and Its Applications(third edition) [M]. New York: Freeman, 1990.
2Sutton R S, Barto A G. Reinforcement Learning [M]. Cambridge. MIT Press, 1998.
3Bowling M, Veloso M. Reusing learned policies between similar problems[A]. Proceedings of AI* IA-98 Workshop on New Trends in Robotics [C]. Berlin, Germany: Springer Verlag. 1998.
4Femandez F, Veloso M. Probabilistic policy reuse in a reinforcement learning agent[A]. Proceedings of the Fifth International Conference on Autonomous Agents and Multi-Agent Systems[C]. New York: ACM, 2006.
5Femandez F, Veloso M. Policy reuse for transfer learning across tasks with different state and action spaces[A]. Proceedings of The ICML-06 Workshop on Structural Knowledge Transfer for Machine Learning[ C]. New York: ACM, 2006.
6Bemstein D S. Reusing old policies to accelerate learning on new MDPs[ R]. Amherst: Amherst College, University of Massachusetts, 1999.
7Pickett M, Barto A G. PolicyBlocks: an algorithm for creating useful macro-actions in reinforcement learning[ A]. Proceedings of the Nineteenth International Conference on Machine Learning [ C]. San Francisco: Morgan Kaufmann, 2002. 506 - 513.
8Mcgovem A, Barto A G. Automatic discovery of subgoals in reinforcement learning using diverse density [ A ]. Proceedings of the Eighteenth International Conference on Machine Learning[ C]. San Francisco: Morgan Kaufmann, 2001. 361 - 368.
9Dietterich T G. Hierarchical reinforcement learning with the MAXQ value function decomposition[ J]. Journal of Artificial Intelligence Research, 2000, 13 (2) : 227 - 303.
10Mehta N, Natarajan S, Tadepalli P, A Fern. Transfer in vari-able-reward hierarchical reinforcement learning [ A ]. Proceedings of the NIPS-05 Workshop on Inductive Transfer [ C ]. Cambridge: MIT Press,2005.360 - 366.

共引文献89

1徐佳,胡春鹤.分布式多经验池的无人机自主避碰方法[J].信息与控制,2023,52(4):432-443.
2韩道军,夏兰亭,卓汉逵,李磊.基于强化学习的业务流程中的柔性约束研究[J].计算机科学,2011,38(3):166-171. 被引量：2
3王雪松,潘杰,程玉虎.基于知识迁移的Ant-Q算法[J].电子学报,2011,39(10):2359-2365. 被引量：4
4吴军,徐昕,王健,贺汉根.面向多机器人系统的增强学习研究进展综述[J].控制与决策,2011,26(11):1601-1610. 被引量：22
5朱美强,程玉虎,李明,王雪松,冯涣婷.一类基于谱方法的强化学习混合迁移算法[J].自动化学报,2012,38(11):1765-1776. 被引量：10
6李冠峰,贺学剑,韩道军.强化学习在中职招生系统中的应用[J].计算机应用与软件,2013,30(4):252-254.
7唐焕玲,于立萍,鲁明羽.融合迁移学习的TranCo-Training分类模型[J].模式识别与人工智能,2013,26(5):432-439. 被引量：1
8CHENG Yuhu CAO Ge WANG Xuesong PAN Jie.Weighted Multi-source TrAdaBoost[J].Chinese Journal of Electronics,2013,22(3):505-510. 被引量：5
9Xiao-hua WANG,Juan-juan YU,Yao HUANG,Hua WANG,Zhong-hua MIAO.Adaptive dynamic programming for linear impulse systems[J].Journal of Zhejiang University-Science C(Computers and Electronics),2014,15(1):43-50.
10陈兴国,高阳,范顺国,俞亚君.基于核方法的连续动作Actor-Critic学习[J].模式识别与人工智能,2014,27(2):103-110. 被引量：8

1银晴,田静,苏新春.语言何以助力乡村振兴[J].语言战略研究,2022,7(1):25-35. 被引量：41
2李卫芳.再谈“V上”和“V下”[J].华文教学与研究,2021(1):32-40. 被引量：3
3赵军.教学视频中教学内容与教师画面对学习的影响[J].信息与电脑,2021,33(19):242-245.
4邵宇,贾晓东.探究式教学及其对线上教学的启示[J].西部素质教育,2021,7(24):141-143. 被引量：3
5徐芬.圆中最值问题的解法[J].科学大众（科学中考）,2021(11):4-6.
6李寿阳.巧借调整策略,妙解数列问题[J].高中数理化,2022(5):41-42.
7夏鹏.虚拟学习环境中的教学设计策略探究[J].教育信息化论坛,2021,5(12):55-56. 被引量：1
8黄会琴.学数学,懂分享——以绘本《一人一半刚刚好》在小学培智生活数学教学为例[J].今天,2021(18):150-150.
9石松.略论明代小说“侠”意识的泛化——以《水浒传》与“三言二拍”为例[J].湖北师范大学学报（哲学社会科学版）,2021,41(1):56-60. 被引量：1
10羊春涛.“阿波罗尼斯圆”模型——中考最值专题探究[J].科学大众（科学中考）,2021(9):36-38.

南京师范大学学报（工程技术版）

2022年第1期

浏览历史

内容加载中请稍等...

小样本场景下的强化学习研究综述

参考文献5

二级参考文献135

共引文献89

相关作者

相关机构

相关主题

浏览历史