Aim To find a more efficient learning method based on temporal difference learning for delayed reinforcement learning tasks. Methods A kind of Q learning algorithm based on truncated TD( λ ) with adaptive scheme...Aim To find a more efficient learning method based on temporal difference learning for delayed reinforcement learning tasks. Methods A kind of Q learning algorithm based on truncated TD( λ ) with adaptive schemes of λ value selection addressed to absorbing Markov decision processes was presented and implemented on computers. Results and Conclusion Simulations on the shortest path searching problems show that using adaptive λ in the Q learning based on TTD( λ ) can speed up its convergence.展开更多
Conducting hydrodynamic and physical motion simulation tests using a large-scale self-propelled model under actual wave conditions is an important means for researching environmental adaptability of ships. During the ...Conducting hydrodynamic and physical motion simulation tests using a large-scale self-propelled model under actual wave conditions is an important means for researching environmental adaptability of ships. During the navigation test of the self-propelled model, the complex environment including various port facilities, navigation facilities, and the ships nearby must be considered carefully, because in this dense environment the impact of sea waves and winds on the model is particularly significant. In order to improve the security of the self-propelled model, this paper introduces the Q learning based on reinforcement learning combined with chaotic ideas for the model's collision avoidance, in order to improve the reliability of the local path planning. Simulation and sea test results show that this algorithm is a better solution for collision avoidance of the self navigation model under the interference of sea winds and waves with good adaptability.展开更多
随着我国电力市场改革的深化,售电侧随之逐步开放,已经成立了很多售电公司并参与了电力零售业务。如何构造在多个市场的购电组合优化策略是电力零售公司关注的重要问题。为发展电力零售公司的购电组合优化策略,必须适当考虑负荷需求和...随着我国电力市场改革的深化,售电侧随之逐步开放,已经成立了很多售电公司并参与了电力零售业务。如何构造在多个市场的购电组合优化策略是电力零售公司关注的重要问题。为发展电力零售公司的购电组合优化策略,必须适当考虑负荷需求和市场电价的不确定性从而进行风险管理。在此背景下,首先以h为单位构造了描述1天内的负荷需求和电力价格的日向量,并采用区间数来描述负荷和电价的波动范围。另一方面,模拟了零售公司在用户侧实施分时电价时对负荷调整和所占市场份额的影响。之后,以电力零售公司日利润总和最大为目标,建立了购电组合优化的强化区间线性规划(enhanced interval linear programming,EILP)模型,并采用解析方法求解。最后,采用美国PJM(Pennsylvania,New Jersey,Maryland)电力市场的负荷和电力价格实际数据,对所提出的方法做了说明。展开更多
文摘Aim To find a more efficient learning method based on temporal difference learning for delayed reinforcement learning tasks. Methods A kind of Q learning algorithm based on truncated TD( λ ) with adaptive schemes of λ value selection addressed to absorbing Markov decision processes was presented and implemented on computers. Results and Conclusion Simulations on the shortest path searching problems show that using adaptive λ in the Q learning based on TTD( λ ) can speed up its convergence.
基金Foundation item: Supported by the National Natural Science Foundation of China under Grant No.61100005.
文摘Conducting hydrodynamic and physical motion simulation tests using a large-scale self-propelled model under actual wave conditions is an important means for researching environmental adaptability of ships. During the navigation test of the self-propelled model, the complex environment including various port facilities, navigation facilities, and the ships nearby must be considered carefully, because in this dense environment the impact of sea waves and winds on the model is particularly significant. In order to improve the security of the self-propelled model, this paper introduces the Q learning based on reinforcement learning combined with chaotic ideas for the model's collision avoidance, in order to improve the reliability of the local path planning. Simulation and sea test results show that this algorithm is a better solution for collision avoidance of the self navigation model under the interference of sea winds and waves with good adaptability.
文摘随着我国电力市场改革的深化,售电侧随之逐步开放,已经成立了很多售电公司并参与了电力零售业务。如何构造在多个市场的购电组合优化策略是电力零售公司关注的重要问题。为发展电力零售公司的购电组合优化策略,必须适当考虑负荷需求和市场电价的不确定性从而进行风险管理。在此背景下,首先以h为单位构造了描述1天内的负荷需求和电力价格的日向量,并采用区间数来描述负荷和电价的波动范围。另一方面,模拟了零售公司在用户侧实施分时电价时对负荷调整和所占市场份额的影响。之后,以电力零售公司日利润总和最大为目标,建立了购电组合优化的强化区间线性规划(enhanced interval linear programming,EILP)模型,并采用解析方法求解。最后,采用美国PJM(Pennsylvania,New Jersey,Maryland)电力市场的负荷和电力价格实际数据,对所提出的方法做了说明。