A New Reward System Based on Human Demonstrations for Hard Exploration Games

下载PDF

导出

摘要 The main idea of reinforcement learning is evaluating the chosen action depending on the current reward.According to this concept,many algorithms achieved proper performance on classic Atari 2600 games.The main challenge is when the reward is sparse or missing.Such environments are complex exploration environments likeMontezuma’s Revenge,Pitfall,and Private Eye games.Approaches built to deal with such challenges were very demanding.This work introduced a different reward system that enables the simple classical algorithm to learn fast and achieve high performance in hard exploration environments.Moreover,we added some simple enhancements to several hyperparameters,such as the number of actions and the sampling ratio that helped improve performance.We include the extra reward within the human demonstrations.After that,we used Prioritized Double Deep Q-Networks(Prioritized DDQN)to learning from these demonstrations.Our approach enabled the Prioritized DDQNwith a short learning time to finish the first level of Montezuma’s Revenge game and to perform well in both Pitfall and Private Eye.We used the same games to compare our results with several baselines,such as the Rainbow and Deep Q-learning from demonstrations(DQfD)algorithm.The results showed that the new rewards system enabled Prioritized DDQN to out-perform the baselines in the hard exploration games with short learning time.

作者 Wadhah Zeyad Tareq Mehmet Fatih Amasyali

机构地区 Faculty of Electrical and Electronics Engineering

出处《Computers, Materials & Continua》 SCIE EI 2022年第2期2401-2414,共14页 计算机、材料和连续体（英文）

关键词 Deep reinforcement learning human demonstrations prioritized double deep q-networks atari

分类号 H31 [语言文字—英语]

引文网络
相关文献

1Is the Rainbow Diet the Healthest?[J].数理天地（高中版）,2021(11):48-49.
2张艳琳.“Unit 2 My favourite season” Part A Let’s spell教学设计[J].课程教材教学研究（小教研究）,2021(5):89-91.
3Michael Ganger,Ethan Duryea,Wei Hu.Double Sarsa and Double Expected Sarsa with Shallow and Deep Learning[J].Journal of Data Analysis and Information Processing,2016,4(4):159-176. 被引量：10
4思维游戏[J].英语角,2022(17):32-33.
5Mingyou Li,Qian Shen,Foong Mei Wong,Hongyan Xu,Ni Hong,Lingbing Zeng,Lin Liu,Qiwei Wei,Yunhan Hong.Germ cell sex prior to meiosis in the rainbow trout[J].Protein & Cell,2011,2(1):48-54. 被引量：1
6睡前刷手机8分钟大脑将持续兴奋1小时[J].科学大观园,2022(6):5-5.
7荣鑫.《马克思的复仇:资本主义的复苏和苏联集权社会主义的灭亡》[J].哲学门,2008,9(2):18-18.
8Xiaoqin ZHANG,Huimin MA,Xiong LUO,Jian YUAN.LIDAR:learning from imperfect demonstrations with advantage rectification[J].Frontiers of Computer Science,2022,16(1):57-66.
9梁成功.“to+名词”结构用法小议[J].中小学英语教学与研究,2022(4):78-78.
10刘素庚.看中考解析现在完成时(一)[J].初中生学习指导,2020(20):38-40.

Computers, Materials & Continua

2022年第2期

浏览历史

内容加载中请稍等...

A New Reward System Based on Human Demonstrations for Hard Exploration Games

相关作者

相关机构

相关主题

浏览历史