This paper considers the optimal replacement problem of a repairable system consisting of one component and a single repairman, assume that the system after repair is not 'as good as new', by using the geometr...This paper considers the optimal replacement problem of a repairable system consisting of one component and a single repairman, assume that the system after repair is not 'as good as new', by using the geometric process, we consider a placement policy T based on the age of the system. The problem is to determine the optimal replacement policy T * such that the long_run expected benefit per unit time is maximized. Also, the explicit expression of the long_run expected benefit per unit time can be found. In some conditions, the existence and uniqueness of the optimal policy T * can be proved, finally, we prove that the policy T * is better than the policy T * in .展开更多
Aim To investigate the model free multi step average reward reinforcement learning algorithm. Methods By combining the R learning algorithms with the temporal difference learning (TD( λ ) learning) algorithm...Aim To investigate the model free multi step average reward reinforcement learning algorithm. Methods By combining the R learning algorithms with the temporal difference learning (TD( λ ) learning) algorithms for average reward problems, a novel incremental algorithm, called R( λ ) learning, was proposed. Results and Conclusion The proposed algorithm is a natural extension of the Q( λ) learning, the multi step discounted reward reinforcement learning algorithm, to the average reward cases. Simulation results show that the R( λ ) learning with intermediate λ values makes significant performance improvement over the simple R learning.展开更多
文摘This paper considers the optimal replacement problem of a repairable system consisting of one component and a single repairman, assume that the system after repair is not 'as good as new', by using the geometric process, we consider a placement policy T based on the age of the system. The problem is to determine the optimal replacement policy T * such that the long_run expected benefit per unit time is maximized. Also, the explicit expression of the long_run expected benefit per unit time can be found. In some conditions, the existence and uniqueness of the optimal policy T * can be proved, finally, we prove that the policy T * is better than the policy T * in .
文摘Aim To investigate the model free multi step average reward reinforcement learning algorithm. Methods By combining the R learning algorithms with the temporal difference learning (TD( λ ) learning) algorithms for average reward problems, a novel incremental algorithm, called R( λ ) learning, was proposed. Results and Conclusion The proposed algorithm is a natural extension of the Q( λ) learning, the multi step discounted reward reinforcement learning algorithm, to the average reward cases. Simulation results show that the R( λ ) learning with intermediate λ values makes significant performance improvement over the simple R learning.