Curricular Robust Reinforcement Learning via GAN-Based Perturbation Through Continuously Scheduled Task Sequence

导出

摘要 Reinforcement learning(RL),one of three branches of machine learning,aims for autonomous learning and is now greatly driving the artificial intelligence development,especially in autonomous distributed systems,such as cooperative Boston Dynamics robots.However,robust RL has been a challenging problem of reliable aspects due to the gap between laboratory simulation and real world.Existing efforts have been made to approach this problem,such as performing random environmental perturbations in the learning process.However,one cannot guarantee to train with a positive perturbation as bad ones might bring failures to RL.In this work,we treat robust RL as a multi-task RL problem,and propose a curricular robust RL approach.We first present a generative adversarial network(GAN)based task generation model to iteratively output new tasks at the appropriate level of difficulty for the current policy.Furthermore,with these progressive tasks,we can realize curricular learning and finally obtain a robust policy.Extensive experiments in multiple environments demonstrate that our method improves the training stability and is robust to differences in training/test conditions.

作者 Yike Li Yunzhe Tian Endong Tong Wenjia Niu Yingxiao Xiang Tong Chen Yalun Wu Jiqiang Liu

机构地区 Beijing Key Laboratory of Security and Privacy in Intelligent Transportation

出处《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2023年第1期27-38,共12页 清华大学学报（自然科学版（英文版）

基金 supported by the National Natural Science Foundation of China (Nos.61972025,61802389,61672092,U1811264,and 61966009) the National Key R&D Program of China (Nos.2020YFB1005604 and 2020YFB2103802).

关键词 robust reinforcement learning generative adversarial network(GAN)based model curricular learning

分类号 H31 [语言文字—英语]

引文网络
相关文献

1罗永超,李石朋,李迪.基于无穷范数的视觉伺服自适应控制方法[J].吉林大学学报（信息科学版）,2020,38(6):669-674.
2高锦风,陈玉,魏永明,李剑南.基于改进的YOLOv3和Facenet的无人机影像人脸识别[J].中国科学院大学学报（中英文）,2023,40(1):93-100. 被引量：2
3Antai Global Summer School Program A Perfect Summer to Meet World Elite Cohorts[J].城市漫步（上海版、英文）,2016(4):19-19.
4MATTHEW BOSSONS.PAIRING PINTS A guide to matching your ales with your edibles[J].城市漫步（GBA版）,2015(9):54-55.
5Tengfei Yang,Xiaojun Shi,Yangyang Li,Binbin Huang,Haiyong Xie,Yanting Shen.Workload Allocation Based on User Mobility in Mobile Edge Computing[J].Journal on Big Data,2020,2(3):105-115.
6高寅露,程开,蒋雪,赵纪军.Interface engineering of transition metal dichalcogenide/GaN heterostructures:Modified broadband for photoelectronic performance[J].Chinese Physics B,2022,31(11):515-521.
7林鸿辉,刘建华,郑智雄,胡任远,罗逸轩.联合对话行为识别与情感分类的多任务网络[J].计算机工程与应用,2023,59(3):104-111.
8DR.PEGGY LU.NEW VACCINES Four fast facts[J].城市漫步（GBA版）,2015(1):47-47.
9Biwu Chu,Tianzeng Chen,Yongchun Liu,Qingxin Ma,Yujing Mu,Yonghong Wang,Jinzhu Ma,Peng Zhang,Jun Liu,Chunshan Liu,Huaqiao Gui,Renzhi Hu,Bo Hu,Xinming Wang,Yuesi Wang,Jianguo Liu,Pinhua Xie,Jianmin Chen,Qian Liu,Jingkun Jiang,Junhua Li,Kebin He,Wenqing Liu,Guibin Jiang,Jiming Hao,Hong He.Application of smog chambers in atmospheric process studies[J].National Science Review,2022,9(2):121-136. 被引量：1
10Tiantian Wang,Xiaohong Su.Learning Effectiveness Oriented Hybrid Teaching Mode[J].计算机教育,2022(12):14-19. 被引量：1

Tsinghua Science and Technology

2023年第1期

浏览历史

内容加载中请稍等...

Curricular Robust Reinforcement Learning via GAN-Based Perturbation Through Continuously Scheduled Task Sequence

相关作者

相关机构

相关主题

浏览历史