摘要
Meta-learning has been widely applied to solving few-shot reinforcement learning problems,where we hope to obtain an agent that can learn quickly in a new task.However,these algorithms often ignore some isolated tasks in pursuit of the average performance,which may result in negative adaptation in these isolated tasks,and they usually need sufficient learning in a stationary task distribution.In this paper,our algorithm presents a hierarchical framework of double meta-learning,and the whole framework includes classification,meta-learning,and re-adaptation.Firstly,in the classification process,we classify tasks into several task subsets,considered as some categories of tasks,by learned parameters of each task,which can separate out some isolated tasks thereafter.Secondly,in the meta-learning process,we learn category parameters in all subsets via meta-learning.Simultaneously,based on the gradient of each category parameter in each subset,we use meta-learning again to learn a new metaparameter related to the whole task set,which can be used as an initial parameter for the new task.Finally,in the re-adaption process,we adapt the parameter of the new task with two steps,by the meta-parameter and the appropriate category parameter successively.Experimentally,we demonstrate our algorithm prevents the agent from negative adaptation without losing the average performance for the whole task set.Additionally,our algorithm presents a more rapid adaptation process within readaptation.Moreover,we show the good performance of our algorithm with fewer samples as the agent is exposed to an online meta-learning setting.
基金
financially supported by the National Key R&D Program of China(2020YFC2006602)
the National Natural Science Foundation of China(Grant Nos.62072324,61876217,61876121,61772357)
University Natural Science Foundation of Jiangsu Province(No.21KJA520005)
Primary Research and Development Plan of Jiangsu Province(BE2020026)
Natural ScienceFoundationof Jiangsu Province(BK20190942)
Postgraduate Research&Practice Innovation Program of Jiangsu Province(No.KYCX21_3020).