Option is a promising method to discover the hierarchical structure in reinforcement learning (RL) for learning acceleration. The key to option discovery is about how an agent can find useful subgoals autonomically ...Option is a promising method to discover the hierarchical structure in reinforcement learning (RL) for learning acceleration. The key to option discovery is about how an agent can find useful subgoals autonomically among the passing trails. By analyzing the agent's actions in the trails, useful heuristics can be found. Not only does the agent pass subgoals more frequently, but also its effective actions are restricted in subgoals. As a consequence, the subgoals can be deemed as the most matching action-restricted states in the paths. In the grid-world environment, the concept of the unique-direction value reflecting the action-restricted property was introduced to find the most matching action-restricted states. The unique-direction-value (UDV) approach is chosen to form options offline and online autonomically. Experiments show that the approach can find subgoals correctly. Thus the Q-learning with options found on both offline and online process can accelerate learning significantly.展开更多
Liquid metal flow behavior in round strands continuous casting under intermittently reversing direction electromagnetic stirring was measured by ultrasonic Doppler velocity-meter in a physical simulation system in ord...Liquid metal flow behavior in round strands continuous casting under intermittently reversing direction electromagnetic stirring was measured by ultrasonic Doppler velocity-meter in a physical simulation system in order to investigate the effects of time interval(t_i)of periodically reversed magnetic field on the spatial and temporal flow.The results show that under electromagnetic stirring with direction reserved magnetic field,there's a periodically change of the metal flow velocity and rotation direction with the periodically direction changing of the magnetic field.From both the experimental and mathematical model calculation results,it is found that when t_i is nearly equal to the time required for the metal flow speeding to the maximum velocity from still and decreases to zero again,there is a critical value of the rate of dynamic pressure,which means the wash effect of the liquid metal flow.On this point,rate of dynamic pressure was proposed to be a criterion for optimization the processing of electromagnetic stirring.展开更多
基金supported by the National Basic Research Program of China (2013CB329603)the National Natural Science Foundation of China (61375058, 71231002)+1 种基金the China Mobile Research Fund (MCM 20130351)the Ministry of Education of China and the Special Co-Construction Project of Beijing Municipal Commission of Education
文摘Option is a promising method to discover the hierarchical structure in reinforcement learning (RL) for learning acceleration. The key to option discovery is about how an agent can find useful subgoals autonomically among the passing trails. By analyzing the agent's actions in the trails, useful heuristics can be found. Not only does the agent pass subgoals more frequently, but also its effective actions are restricted in subgoals. As a consequence, the subgoals can be deemed as the most matching action-restricted states in the paths. In the grid-world environment, the concept of the unique-direction value reflecting the action-restricted property was introduced to find the most matching action-restricted states. The unique-direction-value (UDV) approach is chosen to form options offline and online autonomically. Experiments show that the approach can find subgoals correctly. Thus the Q-learning with options found on both offline and online process can accelerate learning significantly.
基金Item Sponsored by National Natural Science Foundation of China(No.50874133)
文摘Liquid metal flow behavior in round strands continuous casting under intermittently reversing direction electromagnetic stirring was measured by ultrasonic Doppler velocity-meter in a physical simulation system in order to investigate the effects of time interval(t_i)of periodically reversed magnetic field on the spatial and temporal flow.The results show that under electromagnetic stirring with direction reserved magnetic field,there's a periodically change of the metal flow velocity and rotation direction with the periodically direction changing of the magnetic field.From both the experimental and mathematical model calculation results,it is found that when t_i is nearly equal to the time required for the metal flow speeding to the maximum velocity from still and decreases to zero again,there is a critical value of the rate of dynamic pressure,which means the wash effect of the liquid metal flow.On this point,rate of dynamic pressure was proposed to be a criterion for optimization the processing of electromagnetic stirring.