A fter 60 years of im plem entation, the ethnic regional autonomous policy has provided basic p o litica l support for promoting the common development and prosperity of a ll ethnic groups.
Policy evaluation(PE)is a critical sub-problem in reinforcement learning,which estimates the value function for a given policy and can be used for policy improvement.However,there still exist some limitations in curre...Policy evaluation(PE)is a critical sub-problem in reinforcement learning,which estimates the value function for a given policy and can be used for policy improvement.However,there still exist some limitations in current PE methods,such as low sample efficiency and local convergence,especially on complex tasks.In this study,a novel PE algorithm called Least-Squares Truncated Temporal-Difference learning(LST2D)is proposed.In LST2D,an adaptive truncation mechanism is designed,which effectively takes advantage of the fast convergence property of Least-Squares Temporal Difference learning and the asymptotic convergence property of Temporal Difference learning(TD).Then,two feature pre-training methods are utilised to improve the approximation ability of LST2D.Furthermore,an Actor-Critic algorithm based on LST2D and pre-trained feature representations(ACLPF)is proposed,where LST2D is integrated into the critic network to improve learning-prediction efficiency.Comprehensive simulation studies were conducted on four robotic tasks,and the corresponding results illustrate the effectiveness of LST2D.The proposed ACLPF algorithm outperformed DQN,ACER and PPO in terms of sample efficiency and stability,which demonstrated that LST2D can be applied to online learning control problems by incorporating it into the actor-critic architecture.展开更多
Two field observations were conducted around the Lembeh Strait in September 2015 and 2016, respectively.Evidences indicate that seawater around the Lembeh Strait is consisted of North Pacific Tropical Water(NPTW),Nort...Two field observations were conducted around the Lembeh Strait in September 2015 and 2016, respectively.Evidences indicate that seawater around the Lembeh Strait is consisted of North Pacific Tropical Water(NPTW),North Pacific Intermediate Water(NPIW), North Pacific Tropical Intermediate Water(NPTIW) and Antarctic Intermediate Water(AAIW). Around the Lembeh Strait, there exist some north-south differences in terms of water mass properties. NPTIW is only found in the southern Lembeh Strait. Water mass with the salinity of 34.6 is only detected at 200–240 m between NPTW and NPTIW in the southern Lembeh Strait, and results from the process of mixing between the saltier water transported from the South Pacific Ocean and the lighter water from the North Pacific Ocean and Sulawesi Sea. According to the analysis on mixing layer depth, it is indicated that there exists an onshore surface current in the northern Lembeh Strait and the surface current in the Lembeh Strait is southward.These dramatic differences of water masses demonstrate that the less water exchange has been occurred between the north and south of Lembeh Strait. In 2015, the positive wind stress curl covering the northern Lembeh Strait induces the shoaling of thermocline and deepening of NPIW, which show that the north-south difference of airsea system is possible of inducing north-south differences of seawater properties.展开更多
文摘A fter 60 years of im plem entation, the ethnic regional autonomous policy has provided basic p o litica l support for promoting the common development and prosperity of a ll ethnic groups.
基金Joint Funds of the National Natural Science Foundation of China,Grant/Award Number:U21A20518National Natural Science Foundation of China,Grant/Award Numbers:62106279,61903372。
文摘Policy evaluation(PE)is a critical sub-problem in reinforcement learning,which estimates the value function for a given policy and can be used for policy improvement.However,there still exist some limitations in current PE methods,such as low sample efficiency and local convergence,especially on complex tasks.In this study,a novel PE algorithm called Least-Squares Truncated Temporal-Difference learning(LST2D)is proposed.In LST2D,an adaptive truncation mechanism is designed,which effectively takes advantage of the fast convergence property of Least-Squares Temporal Difference learning and the asymptotic convergence property of Temporal Difference learning(TD).Then,two feature pre-training methods are utilised to improve the approximation ability of LST2D.Furthermore,an Actor-Critic algorithm based on LST2D and pre-trained feature representations(ACLPF)is proposed,where LST2D is integrated into the critic network to improve learning-prediction efficiency.Comprehensive simulation studies were conducted on four robotic tasks,and the corresponding results illustrate the effectiveness of LST2D.The proposed ACLPF algorithm outperformed DQN,ACER and PPO in terms of sample efficiency and stability,which demonstrated that LST2D can be applied to online learning control problems by incorporating it into the actor-critic architecture.
基金The National Key R&D Program of China under contract No.2017YFC1405101the Scientific Research Foundation of the Third Institute of Oceanography under contract No.2016025the China-Indonesia Maritime Cooperation Fund Project"ChinaIndonesia Bitung Ecological Station Establishment"
文摘Two field observations were conducted around the Lembeh Strait in September 2015 and 2016, respectively.Evidences indicate that seawater around the Lembeh Strait is consisted of North Pacific Tropical Water(NPTW),North Pacific Intermediate Water(NPIW), North Pacific Tropical Intermediate Water(NPTIW) and Antarctic Intermediate Water(AAIW). Around the Lembeh Strait, there exist some north-south differences in terms of water mass properties. NPTIW is only found in the southern Lembeh Strait. Water mass with the salinity of 34.6 is only detected at 200–240 m between NPTW and NPTIW in the southern Lembeh Strait, and results from the process of mixing between the saltier water transported from the South Pacific Ocean and the lighter water from the North Pacific Ocean and Sulawesi Sea. According to the analysis on mixing layer depth, it is indicated that there exists an onshore surface current in the northern Lembeh Strait and the surface current in the Lembeh Strait is southward.These dramatic differences of water masses demonstrate that the less water exchange has been occurred between the north and south of Lembeh Strait. In 2015, the positive wind stress curl covering the northern Lembeh Strait induces the shoaling of thermocline and deepening of NPIW, which show that the north-south difference of airsea system is possible of inducing north-south differences of seawater properties.