时序抽象作为分层强化学习的重要研究内容,允许分层强化学习智能体在不同的时间尺度上学习策略,可以有效解决深度强化学习难以处理的稀疏奖励问题。如何端到端地学习到优秀的时序抽象策略一直是分层强化学习研究面临的挑战。Option-Crit...时序抽象作为分层强化学习的重要研究内容,允许分层强化学习智能体在不同的时间尺度上学习策略,可以有效解决深度强化学习难以处理的稀疏奖励问题。如何端到端地学习到优秀的时序抽象策略一直是分层强化学习研究面临的挑战。Option-Critic(OC)框架在Option框架的基础上,通过策略梯度理论,可以有效解决此问题。然而,在策略学习过程中,OC框架会出现Option内部策略动作分布变得十分相似的退化问题。该退化问题影响了OC框架的实验性能,导致Option的可解释性变差。为了解决上述问题,引入互信息知识作为内部奖励,并提出基于互信息优化的Option-Critic算法(Option-Critic Algorithm with Mutual Information Optimization,MIOOC)。MIOOC算法结合了近端策略Option-Critic(Proximal Policy Option-Critic,PPOC)算法,可以保证下层策略的多样性。为了验证算法的有效性,把MIOOC算法和几种常见的强化学习方法在连续实验环境中进行对比实验。实验结果表明,MIOOC算法可以加快模型学习速度,实验性能更优,Option内部策略更有区分度。展开更多
Acid rain can deteriorate the performance of reinforced concrete structure.Combined with the characteristics of acid rain in China,the properties of steel fiber reinforced concrete subjected to acid rain were studied....Acid rain can deteriorate the performance of reinforced concrete structure.Combined with the characteristics of acid rain in China,the properties of steel fiber reinforced concrete subjected to acid rain were studied.The effects of steel fiber content and pH value of acid rain on the mass loss,erosion depth,neutralization depth,and splitting tensile strength of tested concrete were investigated.The mercury intrusion pore(MIP) test was used to analyze the influence of steel fiber on the acid rain resistance of concrete matrix.The results show that the corrosion of steel fiber reinforced concrete subjected to acid rain results from the combined effect of H^+ and SO4^2- in the acid rain,and steel fiber can improve the acid rain resistance of the tested concrete by improving the pore structure and enhancing the tie effect of the concrete matrix.The experiment further indicates that the optimum content of steel fiber is 1.5%compared to the various mixing proportion in this tests.The tested concrete mass loss and splitting tensile strength decrease followed by increasing as a function of corrosion time when the pH value of the simulation solution is 3 or 4,while they decrease continuously in the simulation solution at pH 2.Thanks to the tie effect of steel fiber,the spalling of concrete matrix is significantly improved,and the erosion depth and neutralization depth are less than those of conventional concrete.展开更多
近年来,深度强化学习在控制任务中取得了显著的效果.但受限于探索能力,难以快速且稳定地求解复杂任务.分层强化学习作为深度强化学习的重要分支,主要解决大规模问题.但是仍存在先验知识设定的不合理和无法有效平衡探索与利用等难题.针...近年来,深度强化学习在控制任务中取得了显著的效果.但受限于探索能力,难以快速且稳定地求解复杂任务.分层强化学习作为深度强化学习的重要分支,主要解决大规模问题.但是仍存在先验知识设定的不合理和无法有效平衡探索与利用等难题.针对以上问题,提出优势加权互信息最大化的最大熵分层强化学习(Maximum Entropy Hierarchical Reinforcement Learning with Advantage-weighted Mutual Information Maximization,HRLAMIM)算法.该算法通过优势函数加权重要性采样与互信息最大化,解决由策略引起的样本聚类问题,增加内部奖励来强调Option的多样性.同时,将奖励引入最大熵强化学习目标,使策略具有了更强的探索性和更好的稳定性.此外,采用Option数量退火方法,不仅减少了先验知识对性能的影响,还平衡了算法的探索与利用,并获得了更高的样本效率和更快的学习速度.将HRL-AMIM算法应用于Mujoco任务中,实验表明,与传统深度强化学习算法和同类型的分层强化学习算法相比,HRL-AMIM算法在性能和稳定性方面均具有较大的优势.进一步通过消融实验和超参数敏感性实验,验证了算法的鲁棒性和有效性.展开更多
The low-energy mutual neutralization(MN)reactions Na^(+)+H^(-)→Na(nl)+H have been studied by employing the full quantum-mechanical molecular-orbital close-coupling(QMOCC)method over a wide energy range of 10^(-3)-10^...The low-energy mutual neutralization(MN)reactions Na^(+)+H^(-)→Na(nl)+H have been studied by employing the full quantum-mechanical molecular-orbital close-coupling(QMOCC)method over a wide energy range of 10^(-3)-10^(3) e V/u.Total and state-selective cross sections have been investigated and compared with the available theoretical and experimental data,and the state-selective rate coefficients for the temperature range of 100-10000 K have been obtained.In the present work,all the necessary highly excited states are included,and the influences of rotational couplings and 10 active electrons are considered.It is found that in the energy below 10 e V/u,the Na(4s)state is the most dominant exit state with a contribution of approximately 78%to the branch fraction,which is in best agreement with the experimental data.For energies above 10 e V/u,the MN total cross section is larger than those obtained in other theoretical calculations and shows a slow decreasing trend because the main exit states change,when the energy is above 100 e V/u,the dominant exit state becomes the Na(3p)state,while the Na(4s)state becomes the third most important exit state.The datasets presented in this paper,including the potential energy curve,the radial and rotational couplings,the total and state-selective cross sections,are openly available at https://doi.org/10.57760/sciencedb.j00113.00112.展开更多
文摘时序抽象作为分层强化学习的重要研究内容,允许分层强化学习智能体在不同的时间尺度上学习策略,可以有效解决深度强化学习难以处理的稀疏奖励问题。如何端到端地学习到优秀的时序抽象策略一直是分层强化学习研究面临的挑战。Option-Critic(OC)框架在Option框架的基础上,通过策略梯度理论,可以有效解决此问题。然而,在策略学习过程中,OC框架会出现Option内部策略动作分布变得十分相似的退化问题。该退化问题影响了OC框架的实验性能,导致Option的可解释性变差。为了解决上述问题,引入互信息知识作为内部奖励,并提出基于互信息优化的Option-Critic算法(Option-Critic Algorithm with Mutual Information Optimization,MIOOC)。MIOOC算法结合了近端策略Option-Critic(Proximal Policy Option-Critic,PPOC)算法,可以保证下层策略的多样性。为了验证算法的有效性,把MIOOC算法和几种常见的强化学习方法在连续实验环境中进行对比实验。实验结果表明,MIOOC算法可以加快模型学习速度,实验性能更优,Option内部策略更有区分度。
基金Funded by National Natural Science Foundation of China(No.51380445)Natural Science Foundation of Shan’xi Province,China(No.2013JQ7033)Startup Foundation for Talents of Xi’an University of Architecture and Technology(No.DB 09077)
文摘Acid rain can deteriorate the performance of reinforced concrete structure.Combined with the characteristics of acid rain in China,the properties of steel fiber reinforced concrete subjected to acid rain were studied.The effects of steel fiber content and pH value of acid rain on the mass loss,erosion depth,neutralization depth,and splitting tensile strength of tested concrete were investigated.The mercury intrusion pore(MIP) test was used to analyze the influence of steel fiber on the acid rain resistance of concrete matrix.The results show that the corrosion of steel fiber reinforced concrete subjected to acid rain results from the combined effect of H^+ and SO4^2- in the acid rain,and steel fiber can improve the acid rain resistance of the tested concrete by improving the pore structure and enhancing the tie effect of the concrete matrix.The experiment further indicates that the optimum content of steel fiber is 1.5%compared to the various mixing proportion in this tests.The tested concrete mass loss and splitting tensile strength decrease followed by increasing as a function of corrosion time when the pH value of the simulation solution is 3 or 4,while they decrease continuously in the simulation solution at pH 2.Thanks to the tie effect of steel fiber,the spalling of concrete matrix is significantly improved,and the erosion depth and neutralization depth are less than those of conventional concrete.
文摘近年来,深度强化学习在控制任务中取得了显著的效果.但受限于探索能力,难以快速且稳定地求解复杂任务.分层强化学习作为深度强化学习的重要分支,主要解决大规模问题.但是仍存在先验知识设定的不合理和无法有效平衡探索与利用等难题.针对以上问题,提出优势加权互信息最大化的最大熵分层强化学习(Maximum Entropy Hierarchical Reinforcement Learning with Advantage-weighted Mutual Information Maximization,HRLAMIM)算法.该算法通过优势函数加权重要性采样与互信息最大化,解决由策略引起的样本聚类问题,增加内部奖励来强调Option的多样性.同时,将奖励引入最大熵强化学习目标,使策略具有了更强的探索性和更好的稳定性.此外,采用Option数量退火方法,不仅减少了先验知识对性能的影响,还平衡了算法的探索与利用,并获得了更高的样本效率和更快的学习速度.将HRL-AMIM算法应用于Mujoco任务中,实验表明,与传统深度强化学习算法和同类型的分层强化学习算法相比,HRL-AMIM算法在性能和稳定性方面均具有较大的优势.进一步通过消融实验和超参数敏感性实验,验证了算法的鲁棒性和有效性.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.12204288,11934004,and 12203106)。
文摘The low-energy mutual neutralization(MN)reactions Na^(+)+H^(-)→Na(nl)+H have been studied by employing the full quantum-mechanical molecular-orbital close-coupling(QMOCC)method over a wide energy range of 10^(-3)-10^(3) e V/u.Total and state-selective cross sections have been investigated and compared with the available theoretical and experimental data,and the state-selective rate coefficients for the temperature range of 100-10000 K have been obtained.In the present work,all the necessary highly excited states are included,and the influences of rotational couplings and 10 active electrons are considered.It is found that in the energy below 10 e V/u,the Na(4s)state is the most dominant exit state with a contribution of approximately 78%to the branch fraction,which is in best agreement with the experimental data.For energies above 10 e V/u,the MN total cross section is larger than those obtained in other theoretical calculations and shows a slow decreasing trend because the main exit states change,when the energy is above 100 e V/u,the dominant exit state becomes the Na(3p)state,while the Na(4s)state becomes the third most important exit state.The datasets presented in this paper,including the potential energy curve,the radial and rotational couplings,the total and state-selective cross sections,are openly available at https://doi.org/10.57760/sciencedb.j00113.00112.