Aim To investigate the model free multi step average reward reinforcement learning algorithm. Methods By combining the R learning algorithms with the temporal difference learning (TD( λ ) learning) algorithm...Aim To investigate the model free multi step average reward reinforcement learning algorithm. Methods By combining the R learning algorithms with the temporal difference learning (TD( λ ) learning) algorithms for average reward problems, a novel incremental algorithm, called R( λ ) learning, was proposed. Results and Conclusion The proposed algorithm is a natural extension of the Q( λ) learning, the multi step discounted reward reinforcement learning algorithm, to the average reward cases. Simulation results show that the R( λ ) learning with intermediate λ values makes significant performance improvement over the simple R learning.展开更多
This paper investigated how to learn the optimal action policies in cooperative multi-agent systems if the agents’ rewards are random variables, and proposed a general two-stage learning algorithm for cooperative mul...This paper investigated how to learn the optimal action policies in cooperative multi-agent systems if the agents’ rewards are random variables, and proposed a general two-stage learning algorithm for cooperative multi-(agent) decision processes. The algorithm first calculates the averaged immediate rewards, and considers these learned rewards as the agents’ immediate action rewards to learn the optimal action policies. It is proved that the learning algorithm can find the optimal policies in stochastic environment. Extending the algorithm to stochastic Markov decision processes was also discussed.展开更多
Objective Summarizing the clinical experience of surgical treatment in 2 cases of blunt cardiactrauma and reviewing the relevant literatures. Methods A 6-year-old girl was diagnosed muscular ventricularseptal defect a...Objective Summarizing the clinical experience of surgical treatment in 2 cases of blunt cardiactrauma and reviewing the relevant literatures. Methods A 6-year-old girl was diagnosed muscular ventricularseptal defect and left ventricular aneurysm 2d after automobile accident and underwent ventricular septal defect re-pair 2 weeks after injury. Another 9-year-old boy was diagnosed severe mitral regurgitation resulted from rupture ofposterior papillary muscle 9d after automobile accident and underwent mitral valvuloplasty 2 weeks after injury.Results Heart function of the first patient was in New York Heart Association (NYHA) class echocardiographyshowed no residual septal defect and the size of left ventricular aneurysm reduced. Heart function of the second pa-tient is in NYHA class echocardiography showed mild mitral regurgitation. Conclusion Blunt traumaticheart disease occurs either because of heart compression between sternum and the spine and/or because of myocardi-al contusion; A more aggressive strategy with surgical treatment earlier before deterioration of heart function is ad-vocated; Earlier surgical correction of anatomic deformity will achieve a good result and a long time follow-up isnecessary.展开更多
文摘Aim To investigate the model free multi step average reward reinforcement learning algorithm. Methods By combining the R learning algorithms with the temporal difference learning (TD( λ ) learning) algorithms for average reward problems, a novel incremental algorithm, called R( λ ) learning, was proposed. Results and Conclusion The proposed algorithm is a natural extension of the Q( λ) learning, the multi step discounted reward reinforcement learning algorithm, to the average reward cases. Simulation results show that the R( λ ) learning with intermediate λ values makes significant performance improvement over the simple R learning.
文摘This paper investigated how to learn the optimal action policies in cooperative multi-agent systems if the agents’ rewards are random variables, and proposed a general two-stage learning algorithm for cooperative multi-(agent) decision processes. The algorithm first calculates the averaged immediate rewards, and considers these learned rewards as the agents’ immediate action rewards to learn the optimal action policies. It is proved that the learning algorithm can find the optimal policies in stochastic environment. Extending the algorithm to stochastic Markov decision processes was also discussed.
文摘Objective Summarizing the clinical experience of surgical treatment in 2 cases of blunt cardiactrauma and reviewing the relevant literatures. Methods A 6-year-old girl was diagnosed muscular ventricularseptal defect and left ventricular aneurysm 2d after automobile accident and underwent ventricular septal defect re-pair 2 weeks after injury. Another 9-year-old boy was diagnosed severe mitral regurgitation resulted from rupture ofposterior papillary muscle 9d after automobile accident and underwent mitral valvuloplasty 2 weeks after injury.Results Heart function of the first patient was in New York Heart Association (NYHA) class echocardiographyshowed no residual septal defect and the size of left ventricular aneurysm reduced. Heart function of the second pa-tient is in NYHA class echocardiography showed mild mitral regurgitation. Conclusion Blunt traumaticheart disease occurs either because of heart compression between sternum and the spine and/or because of myocardi-al contusion; A more aggressive strategy with surgical treatment earlier before deterioration of heart function is ad-vocated; Earlier surgical correction of anatomic deformity will achieve a good result and a long time follow-up isnecessary.