The comprehensive carrying capacity of urban land can reflect the re- source level, economic scale, social development and environmental pressure of ur- ban land carrying. The assessment indicator system of urban land...The comprehensive carrying capacity of urban land can reflect the re- source level, economic scale, social development and environmental pressure of ur- ban land carrying. The assessment indicator system of urban land comprehensive carrying capacity was constructed from the 4 aspects of resource, economy, society, environment, and principal component analysis and cluster analysis were used to evaluate the urban land comprehensive carrying capacity of Guangxi and the 14 cities in 2005-2014, and analyzed its spatial and temporal characteristics as well as the driving forces, with the aim to provide references for improving the urban land comprehensive carrying capacity. The results showed that, the overall urban land comprehensive carrying capacity in Guangxi increased in 2005-2014, and there were significant differences in the land comprehensive carrying capacities among the cities in Guangxi in 2005-2014, in which Liuzhou, Guilin, Nanning belonged to the regions with the highest carrying capacity, while Beihai, Yulin, Wutong belonged to the regions with high carrying capacity, and the carrying capacities of the other cities changed with the changes of time. The economic development degree was an important factor influencing urban land comprehensive carrying capacity, but could not directly represent the urban land comprehensive carrying capacity level.展开更多
Policy evaluation(PE)is a critical sub-problem in reinforcement learning,which estimates the value function for a given policy and can be used for policy improvement.However,there still exist some limitations in curre...Policy evaluation(PE)is a critical sub-problem in reinforcement learning,which estimates the value function for a given policy and can be used for policy improvement.However,there still exist some limitations in current PE methods,such as low sample efficiency and local convergence,especially on complex tasks.In this study,a novel PE algorithm called Least-Squares Truncated Temporal-Difference learning(LST2D)is proposed.In LST2D,an adaptive truncation mechanism is designed,which effectively takes advantage of the fast convergence property of Least-Squares Temporal Difference learning and the asymptotic convergence property of Temporal Difference learning(TD).Then,two feature pre-training methods are utilised to improve the approximation ability of LST2D.Furthermore,an Actor-Critic algorithm based on LST2D and pre-trained feature representations(ACLPF)is proposed,where LST2D is integrated into the critic network to improve learning-prediction efficiency.Comprehensive simulation studies were conducted on four robotic tasks,and the corresponding results illustrate the effectiveness of LST2D.The proposed ACLPF algorithm outperformed DQN,ACER and PPO in terms of sample efficiency and stability,which demonstrated that LST2D can be applied to online learning control problems by incorporating it into the actor-critic architecture.展开更多
Aim To find a more efficient learning method based on temporal difference learning for delayed reinforcement learning tasks. Methods A kind of Q learning algorithm based on truncated TD( λ ) with adaptive scheme...Aim To find a more efficient learning method based on temporal difference learning for delayed reinforcement learning tasks. Methods A kind of Q learning algorithm based on truncated TD( λ ) with adaptive schemes of λ value selection addressed to absorbing Markov decision processes was presented and implemented on computers. Results and Conclusion Simulations on the shortest path searching problems show that using adaptive λ in the Q learning based on TTD( λ ) can speed up its convergence.展开更多
The aim of this study is to calculate the low-level atmospheric motion vectors (AMVs) in clear areas with FY-2E IR2 window (11.59-12.79 μm) channel imagery,where the traditional cloud motion wind technique fails....The aim of this study is to calculate the low-level atmospheric motion vectors (AMVs) in clear areas with FY-2E IR2 window (11.59-12.79 μm) channel imagery,where the traditional cloud motion wind technique fails.A new tracer selection procedure,which we call the temporal difference technique,is demonstrated in this paper.This technique makes it possible to infer low-level wind by tracking features in the moisture pattern that appear as brightness temperature (TB) differences between consecutive sequences of 30-min-interval FY-2E IR2 images over cloud-free regions.The TB difference corresponding to a 10% change in water vapor density is computed with the Moderate Resolution Atmospheric Transmission (MODTRAN4) radiative transfer model.The total contribution from each of the 10 layers is analyzed under four typical atmospheric conditions:tropical,midlatitude summer,U.S.standard,and midlatitude winter.The peak level of the water vapor weighting function for the four typical atmospheres is assigned as a specific height to the TB "wind".This technique is valid over cloudfree ocean areas.The proposed algorithm exhibits encouraging statistical results in terms of vector difference (VD),speed bias (BIAS),mean vector difference (MVD),standard deviation (SD),and root-mean-square error (RMSE),when compared with the wind field of NCEP reanalysis data and rawinsonde observations.展开更多
The iterated prisoner's dilemma(IPD) is an ideal model for analyzing interactions between agents in complex networks. It has attracted wide interest in the development of novel strategies since the success of tit-...The iterated prisoner's dilemma(IPD) is an ideal model for analyzing interactions between agents in complex networks. It has attracted wide interest in the development of novel strategies since the success of tit-for-tat in Axelrod's tournament. This paper studies a new adaptive strategy of IPD in different complex networks, where agents can learn and adapt their strategies through reinforcement learning method. A temporal difference learning method is applied for designing the adaptive strategy to optimize the decision making process of the agents. Previous studies indicated that mutual cooperation is hard to emerge in the IPD. Therefore, three examples which based on square lattice network and scale-free network are provided to show two features of the adaptive strategy. First, the mutual cooperation can be achieved by the group with adaptive agents under scale-free network, and once evolution has converged mutual cooperation, it is unlikely to shift. Secondly, the adaptive strategy can earn a better payoff compared with other strategies in the square network. The analytical properties are discussed for verifying evolutionary stability of the adaptive strategy.展开更多
Aim To investigate the model free multi step average reward reinforcement learning algorithm. Methods By combining the R learning algorithms with the temporal difference learning (TD( λ ) learning) algorithm...Aim To investigate the model free multi step average reward reinforcement learning algorithm. Methods By combining the R learning algorithms with the temporal difference learning (TD( λ ) learning) algorithms for average reward problems, a novel incremental algorithm, called R( λ ) learning, was proposed. Results and Conclusion The proposed algorithm is a natural extension of the Q( λ) learning, the multi step discounted reward reinforcement learning algorithm, to the average reward cases. Simulation results show that the R( λ ) learning with intermediate λ values makes significant performance improvement over the simple R learning.展开更多
Through linear regression analysis to the trend of annual,seasonal and monthly precipitation of 72 meteorological stations in Hubei Province from 1961 to 1995,it is revealed that: 1) annual precipitation was increasin...Through linear regression analysis to the trend of annual,seasonal and monthly precipitation of 72 meteorological stations in Hubei Province from 1961 to 1995,it is revealed that: 1) annual precipitation was increasing by 61.0mm/10a in the eastern part of Hubei (112°E as a dividing line) and decreasing by 34.9mm/10a in the western part; 2) precipitation in winter and summer (January,February,March,June and July) was increasing in almost whole province which usually with non-uniformity of precipitation distribution from the south to the north. The precipitation in spring,autumn and winter (April,September,November and December) was decreasing in most of the areas which usually with non-uniformity of precipitation distribution from the east to the west. March and December were transition periods between two spatial distribution patterns mentioned above; 3) the eastern part of Hubei has beome one of precipitation increasing centers in China. The results was consistent with the trend that more frequent flood and drought events happened in Hubei Province which are more different in spatial and temporal scales.展开更多
The last few decades have seen a phenomenal increase in the quality, diversity and pervasiveness of computer games. The worldwide computer games market is estimated to be worth around USD 21bn annually, and is predict...The last few decades have seen a phenomenal increase in the quality, diversity and pervasiveness of computer games. The worldwide computer games market is estimated to be worth around USD 21bn annually, and is predicted to continue to grow rapidly. This paper reviews some of the recent developments in applying computational intelligence (CI) methods to games, points out some of the potential pitfalls, and suggests some fruitful directions for future research.展开更多
Spatial and temporal change pattems of air temperature (7), precipitation (P), relative humidity (RH), lower vapor pressure (VP), potential evapotranspiration (PET) and drought situation of 690 meteorologica...Spatial and temporal change pattems of air temperature (7), precipitation (P), relative humidity (RH), lower vapor pressure (VP), potential evapotranspiration (PET) and drought situation of 690 meteorological stations for all of China were evaluated in this study to understand the effects of warming on regional drought and hydrological processes. Here, the drought extent is expressed by aridity index (AI), which is the ratio of precipitation and reference crop evapotranspiration (ETo) calculated by FAO Pen- man-Monteith equation, taking into account air temperature, atmospheric humidity, solar radiation, and wind. Our results indicate that there are different patterns of climate change from 1961 to 2008 and from 1981 to 2008. Little precipitation change occurred in China and ETo decreased from 1961 to 2008. But, the warming trend has intensified and the area with significant increasing precipitation has reduced since the early 1980's and ETo has increased in most areas of China from 1981 to 2008 and decreased from 1961 to 2008. The areas affected by drought have shifted from North China and Northeast China to East China and South China since 1981. It is speculated that the increasing warming intensity after 1981 possibly strengthened the power of potential evapotmnspiration and resulted in drought in most areas of Northeast China, North China, eastem Southwest China, and especially in East China and South China.展开更多
Molding and simulation of time series prediction based on dynamic neural network(NN) are studied. Prediction model for non-linear and time-varying system is proposed based on dynamic Jordan NN. Aiming at the intrinsic...Molding and simulation of time series prediction based on dynamic neural network(NN) are studied. Prediction model for non-linear and time-varying system is proposed based on dynamic Jordan NN. Aiming at the intrinsic defects of back-propagation (BP) algorithm that cannot update network weights incrementally, a hybrid algorithm combining the temporal difference (TD) method with BP algorithm to train Jordan NN is put forward. The proposed method is applied to predict the ash content of clean coal in jigging production real-time and multi-step. A practical example is also given and its application results indicate that the method has better performance than others and also offers a beneficial reference to the prediction of nonlinear time series.展开更多
Key challenges for 5G and Beyond networks relate with the requirements for exceptionally low latency, high reliability, and extremely high data rates. The Ultra-Reliable Low Latency Communication (URLLC) use case is t...Key challenges for 5G and Beyond networks relate with the requirements for exceptionally low latency, high reliability, and extremely high data rates. The Ultra-Reliable Low Latency Communication (URLLC) use case is the trickiest to support and current research is focused on physical or MAC layer solutions, while proposals focused on the network layer using Machine Learning (ML) and Artificial Intelligence (AI) algorithms running on base stations and User Equipment (UE) or Internet of Things (IoT) devices are in early stages. In this paper, we describe the operation rationale of the most recent relevant ML algorithms and techniques, and we propose and validate ML algorithms running on both cells (base stations/gNBs) and UEs or IoT devices to handle URLLC service control. One ML algorithm runs on base stations to evaluate latency demands and offload traffic in case of need, while another lightweight algorithm runs on UEs and IoT devices to rank cells with the best URLLC service in real-time to indicate the best one cell for a UE or IoT device to camp. We show that the interplay of these algorithms leads to good service control and eventually optimal load allocation, under slow load mobility. .展开更多
The underwater soundscape is an important ecological element affecting numerous aquatic animals,in particular dolphins,which must identify salient cues from ambient ocean noise.In this study,temporal variations in the...The underwater soundscape is an important ecological element affecting numerous aquatic animals,in particular dolphins,which must identify salient cues from ambient ocean noise.In this study,temporal variations in the soundscape of Jiaotou Bay were monitored from February 2016 to January 2017,where a population of Indo-Pacific humpback dolphins(Sousa chinensis)has recently been a regular sighting.An autonomous acoustic recorder was deployed in shallow waters,and 1/3-octave band sound pressure levels(SPLs)were calculated with central frequencies ranging from 25 Hz to 40 kHz,then were grouped into 3 subdivided bands via cluster analysis.SPLs at each major band showed significant differences on a diel,fishing-related period,seasonal,and tidal phase scale.Anthropogenic noise generated by passing ships and underwater explosions were recorded in the study area.The fish and dolphin acoustic activities both exhibited diel and seasonal variations,but no tidal cycle patterns.A negative significant relationship between anthropogenic sound detection rates and dolphin detection rates were observed,and fish detection rates showed no effect on dolphin detection rates,indicating anthropogenic activity avoidance and no forced foraging in dolphins in the study area.The results provide fundamental insight into the acoustic dynamics of an important Indo-Pacific humpback dolphin habitat within a coastal area affected by a rapid increase in human activity,and demonstrate the need to protect animal habitat from anthropogenic noises.展开更多
In the reinforcement learning,policy evaluation aims to predict long-term values of a state under a certain policy.Since high-dimensional representations become more and more common in the reinforcement learning,how t...In the reinforcement learning,policy evaluation aims to predict long-term values of a state under a certain policy.Since high-dimensional representations become more and more common in the reinforcement learning,how to reduce the computational cost becomes a significant problem to the policy evaluation.Many recent works focus on adopting matrix sketching methods to accelerate least-square temporal difference(TD)algorithms and quasi-Newton temporal difference algorithms.Among these sketching methods,the truncated incremental SVD shows better performance because it is stable and efficient.However,the convergence properties of the incremental SVD is still open.In this paper,we first show that the conventional incremental SVD algorithms could have enormous approximation errors in the worst case.Then we propose a variant of incremental SVD with better theoretical guarantees by shrinking the singular values periodically.Moreover,we employ our improved incremental SVD to accelerate least-square TD and quasi-Newton TD algorithms.The experimental results verify the correctness and effectiveness of our methods.展开更多
基金Supported by the Open Bidding Projects of the Office of Land and Resources of the Guangxi Zhuang Autonomous Region(GXZC2015-G3-0575-GTZB)~~
文摘The comprehensive carrying capacity of urban land can reflect the re- source level, economic scale, social development and environmental pressure of ur- ban land carrying. The assessment indicator system of urban land comprehensive carrying capacity was constructed from the 4 aspects of resource, economy, society, environment, and principal component analysis and cluster analysis were used to evaluate the urban land comprehensive carrying capacity of Guangxi and the 14 cities in 2005-2014, and analyzed its spatial and temporal characteristics as well as the driving forces, with the aim to provide references for improving the urban land comprehensive carrying capacity. The results showed that, the overall urban land comprehensive carrying capacity in Guangxi increased in 2005-2014, and there were significant differences in the land comprehensive carrying capacities among the cities in Guangxi in 2005-2014, in which Liuzhou, Guilin, Nanning belonged to the regions with the highest carrying capacity, while Beihai, Yulin, Wutong belonged to the regions with high carrying capacity, and the carrying capacities of the other cities changed with the changes of time. The economic development degree was an important factor influencing urban land comprehensive carrying capacity, but could not directly represent the urban land comprehensive carrying capacity level.
基金Joint Funds of the National Natural Science Foundation of China,Grant/Award Number:U21A20518National Natural Science Foundation of China,Grant/Award Numbers:62106279,61903372。
文摘Policy evaluation(PE)is a critical sub-problem in reinforcement learning,which estimates the value function for a given policy and can be used for policy improvement.However,there still exist some limitations in current PE methods,such as low sample efficiency and local convergence,especially on complex tasks.In this study,a novel PE algorithm called Least-Squares Truncated Temporal-Difference learning(LST2D)is proposed.In LST2D,an adaptive truncation mechanism is designed,which effectively takes advantage of the fast convergence property of Least-Squares Temporal Difference learning and the asymptotic convergence property of Temporal Difference learning(TD).Then,two feature pre-training methods are utilised to improve the approximation ability of LST2D.Furthermore,an Actor-Critic algorithm based on LST2D and pre-trained feature representations(ACLPF)is proposed,where LST2D is integrated into the critic network to improve learning-prediction efficiency.Comprehensive simulation studies were conducted on four robotic tasks,and the corresponding results illustrate the effectiveness of LST2D.The proposed ACLPF algorithm outperformed DQN,ACER and PPO in terms of sample efficiency and stability,which demonstrated that LST2D can be applied to online learning control problems by incorporating it into the actor-critic architecture.
文摘Aim To find a more efficient learning method based on temporal difference learning for delayed reinforcement learning tasks. Methods A kind of Q learning algorithm based on truncated TD( λ ) with adaptive schemes of λ value selection addressed to absorbing Markov decision processes was presented and implemented on computers. Results and Conclusion Simulations on the shortest path searching problems show that using adaptive λ in the Q learning based on TTD( λ ) can speed up its convergence.
基金supported by the National Natural Science Foundation of China (Grant Nos.41175035 and 41005005)the National Basic Research Program of China (Grant No.2009CB421502)a project funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD)
文摘The aim of this study is to calculate the low-level atmospheric motion vectors (AMVs) in clear areas with FY-2E IR2 window (11.59-12.79 μm) channel imagery,where the traditional cloud motion wind technique fails.A new tracer selection procedure,which we call the temporal difference technique,is demonstrated in this paper.This technique makes it possible to infer low-level wind by tracking features in the moisture pattern that appear as brightness temperature (TB) differences between consecutive sequences of 30-min-interval FY-2E IR2 images over cloud-free regions.The TB difference corresponding to a 10% change in water vapor density is computed with the Moderate Resolution Atmospheric Transmission (MODTRAN4) radiative transfer model.The total contribution from each of the 10 layers is analyzed under four typical atmospheric conditions:tropical,midlatitude summer,U.S.standard,and midlatitude winter.The peak level of the water vapor weighting function for the four typical atmospheres is assigned as a specific height to the TB "wind".This technique is valid over cloudfree ocean areas.The proposed algorithm exhibits encouraging statistical results in terms of vector difference (VD),speed bias (BIAS),mean vector difference (MVD),standard deviation (SD),and root-mean-square error (RMSE),when compared with the wind field of NCEP reanalysis data and rawinsonde observations.
基金supported by the National Natural Science Foundation(NNSF)of China(61603196,61503079,61520106009,61533008)the Natural Science Foundation of Jiangsu Province of China(BK20150851)+4 种基金China Postdoctoral Science Foundation(2015M581842)Jiangsu Postdoctoral Science Foundation(1601259C)Nanjing University of Posts and Telecommunications Science Foundation(NUPTSF)(NY215011)Priority Academic Program Development of Jiangsu Higher Education Institutions,the open fund of Key Laboratory of Measurement and Control of Complex Systems of Engineering,Ministry of Education(MCCSE2015B02)the Research Innovation Program for College Graduates of Jiangsu Province(CXLX1309)
文摘The iterated prisoner's dilemma(IPD) is an ideal model for analyzing interactions between agents in complex networks. It has attracted wide interest in the development of novel strategies since the success of tit-for-tat in Axelrod's tournament. This paper studies a new adaptive strategy of IPD in different complex networks, where agents can learn and adapt their strategies through reinforcement learning method. A temporal difference learning method is applied for designing the adaptive strategy to optimize the decision making process of the agents. Previous studies indicated that mutual cooperation is hard to emerge in the IPD. Therefore, three examples which based on square lattice network and scale-free network are provided to show two features of the adaptive strategy. First, the mutual cooperation can be achieved by the group with adaptive agents under scale-free network, and once evolution has converged mutual cooperation, it is unlikely to shift. Secondly, the adaptive strategy can earn a better payoff compared with other strategies in the square network. The analytical properties are discussed for verifying evolutionary stability of the adaptive strategy.
文摘Aim To investigate the model free multi step average reward reinforcement learning algorithm. Methods By combining the R learning algorithms with the temporal difference learning (TD( λ ) learning) algorithms for average reward problems, a novel incremental algorithm, called R( λ ) learning, was proposed. Results and Conclusion The proposed algorithm is a natural extension of the Q( λ) learning, the multi step discounted reward reinforcement learning algorithm, to the average reward cases. Simulation results show that the R( λ ) learning with intermediate λ values makes significant performance improvement over the simple R learning.
文摘Through linear regression analysis to the trend of annual,seasonal and monthly precipitation of 72 meteorological stations in Hubei Province from 1961 to 1995,it is revealed that: 1) annual precipitation was increasing by 61.0mm/10a in the eastern part of Hubei (112°E as a dividing line) and decreasing by 34.9mm/10a in the western part; 2) precipitation in winter and summer (January,February,March,June and July) was increasing in almost whole province which usually with non-uniformity of precipitation distribution from the south to the north. The precipitation in spring,autumn and winter (April,September,November and December) was decreasing in most of the areas which usually with non-uniformity of precipitation distribution from the east to the west. March and December were transition periods between two spatial distribution patterns mentioned above; 3) the eastern part of Hubei has beome one of precipitation increasing centers in China. The results was consistent with the trend that more frequent flood and drought events happened in Hubei Province which are more different in spatial and temporal scales.
文摘The last few decades have seen a phenomenal increase in the quality, diversity and pervasiveness of computer games. The worldwide computer games market is estimated to be worth around USD 21bn annually, and is predicted to continue to grow rapidly. This paper reviews some of the recent developments in applying computational intelligence (CI) methods to games, points out some of the potential pitfalls, and suggests some fruitful directions for future research.
基金supported by"Hundred Talents Program" of CASNational Key Research Program of Chinese (2009CB421308)
文摘Spatial and temporal change pattems of air temperature (7), precipitation (P), relative humidity (RH), lower vapor pressure (VP), potential evapotranspiration (PET) and drought situation of 690 meteorological stations for all of China were evaluated in this study to understand the effects of warming on regional drought and hydrological processes. Here, the drought extent is expressed by aridity index (AI), which is the ratio of precipitation and reference crop evapotranspiration (ETo) calculated by FAO Pen- man-Monteith equation, taking into account air temperature, atmospheric humidity, solar radiation, and wind. Our results indicate that there are different patterns of climate change from 1961 to 2008 and from 1981 to 2008. Little precipitation change occurred in China and ETo decreased from 1961 to 2008. But, the warming trend has intensified and the area with significant increasing precipitation has reduced since the early 1980's and ETo has increased in most areas of China from 1981 to 2008 and decreased from 1961 to 2008. The areas affected by drought have shifted from North China and Northeast China to East China and South China since 1981. It is speculated that the increasing warming intensity after 1981 possibly strengthened the power of potential evapotmnspiration and resulted in drought in most areas of Northeast China, North China, eastem Southwest China, and especially in East China and South China.
文摘Molding and simulation of time series prediction based on dynamic neural network(NN) are studied. Prediction model for non-linear and time-varying system is proposed based on dynamic Jordan NN. Aiming at the intrinsic defects of back-propagation (BP) algorithm that cannot update network weights incrementally, a hybrid algorithm combining the temporal difference (TD) method with BP algorithm to train Jordan NN is put forward. The proposed method is applied to predict the ash content of clean coal in jigging production real-time and multi-step. A practical example is also given and its application results indicate that the method has better performance than others and also offers a beneficial reference to the prediction of nonlinear time series.
文摘Key challenges for 5G and Beyond networks relate with the requirements for exceptionally low latency, high reliability, and extremely high data rates. The Ultra-Reliable Low Latency Communication (URLLC) use case is the trickiest to support and current research is focused on physical or MAC layer solutions, while proposals focused on the network layer using Machine Learning (ML) and Artificial Intelligence (AI) algorithms running on base stations and User Equipment (UE) or Internet of Things (IoT) devices are in early stages. In this paper, we describe the operation rationale of the most recent relevant ML algorithms and techniques, and we propose and validate ML algorithms running on both cells (base stations/gNBs) and UEs or IoT devices to handle URLLC service control. One ML algorithm runs on base stations to evaluate latency demands and offload traffic in case of need, while another lightweight algorithm runs on UEs and IoT devices to rank cells with the best URLLC service in real-time to indicate the best one cell for a UE or IoT device to camp. We show that the interplay of these algorithms leads to good service control and eventually optimal load allocation, under slow load mobility. .
基金supported by the National Key Research and Development Program of China(Grant Nos.2016YFC0300803 and 2018YFC0308602)National Natural Science Foundation of China(Nos.41422604 and 41306169)+1 种基金supported by the China Ocean Mineral Resources R&D Association(No.DY135-E2-4)the Ocean Park Conservation Foundation of Hong Kong(MM02-1516).
文摘The underwater soundscape is an important ecological element affecting numerous aquatic animals,in particular dolphins,which must identify salient cues from ambient ocean noise.In this study,temporal variations in the soundscape of Jiaotou Bay were monitored from February 2016 to January 2017,where a population of Indo-Pacific humpback dolphins(Sousa chinensis)has recently been a regular sighting.An autonomous acoustic recorder was deployed in shallow waters,and 1/3-octave band sound pressure levels(SPLs)were calculated with central frequencies ranging from 25 Hz to 40 kHz,then were grouped into 3 subdivided bands via cluster analysis.SPLs at each major band showed significant differences on a diel,fishing-related period,seasonal,and tidal phase scale.Anthropogenic noise generated by passing ships and underwater explosions were recorded in the study area.The fish and dolphin acoustic activities both exhibited diel and seasonal variations,but no tidal cycle patterns.A negative significant relationship between anthropogenic sound detection rates and dolphin detection rates were observed,and fish detection rates showed no effect on dolphin detection rates,indicating anthropogenic activity avoidance and no forced foraging in dolphins in the study area.The results provide fundamental insight into the acoustic dynamics of an important Indo-Pacific humpback dolphin habitat within a coastal area affected by a rapid increase in human activity,and demonstrate the need to protect animal habitat from anthropogenic noises.
基金The corresponding author Weinan Zhang was supported by the“New Generation of AI 2030”Major Project(2018AAA0100900)the National Natural Science Foundation of China(Grant Nos.62076161,61772333,61632017).
文摘In the reinforcement learning,policy evaluation aims to predict long-term values of a state under a certain policy.Since high-dimensional representations become more and more common in the reinforcement learning,how to reduce the computational cost becomes a significant problem to the policy evaluation.Many recent works focus on adopting matrix sketching methods to accelerate least-square temporal difference(TD)algorithms and quasi-Newton temporal difference algorithms.Among these sketching methods,the truncated incremental SVD shows better performance because it is stable and efficient.However,the convergence properties of the incremental SVD is still open.In this paper,we first show that the conventional incremental SVD algorithms could have enormous approximation errors in the worst case.Then we propose a variant of incremental SVD with better theoretical guarantees by shrinking the singular values periodically.Moreover,we employ our improved incremental SVD to accelerate least-square TD and quasi-Newton TD algorithms.The experimental results verify the correctness and effectiveness of our methods.