Parking in a small parking lot within limited space poses a difficult task. It often leads to deviations between the final parking posture and the target posture. These deviations can lead to partial occupancy of adja...Parking in a small parking lot within limited space poses a difficult task. It often leads to deviations between the final parking posture and the target posture. These deviations can lead to partial occupancy of adjacent parking lots, which poses a safety threat to vehicles parked in these parking lots. However, previous studies have not addressed this issue. In this paper, we aim to evaluate the impact of parking deviation of existing vehicles next to the target parking lot(PDEVNTPL) on the automatic ego vehicle(AEV) parking, in terms of safety, comfort, accuracy, and efficiency of parking. A segmented parking training framework(SPTF) based on soft actor-critic(SAC) is proposed to improve parking performance. In the proposed method, the SAC algorithm incorporates strategy entropy into the objective function, to enable the AEV to learn parking strategies based on a more comprehensive understanding of the environment. Additionally, the SPTF simplifies complex parking tasks to maintain the high performance of deep reinforcement learning(DRL). The experimental results reveal that the PDEVNTPL has a detrimental influence on the AEV parking in terms of safety, accuracy, and comfort, leading to reductions of more than 27%, 54%, and 26%respectively. However, the SAC-based SPTF effectively mitigates this impact, resulting in a considerable increase in the parking success rate from 71% to 93%. Furthermore, the heading angle deviation is significantly reduced from 2.25 degrees to 0.43degrees.展开更多
Actor-Critic是一种强化学习方法,通过与环境在线试错交互收集样本来学习策略,是求解序贯感知决策问题的有效手段.但是,这种在线交互的主动学习范式在一些复杂真实环境中收集样本时会带来成本和安全问题离线强化学习作为一种基于数据驱...Actor-Critic是一种强化学习方法,通过与环境在线试错交互收集样本来学习策略,是求解序贯感知决策问题的有效手段.但是,这种在线交互的主动学习范式在一些复杂真实环境中收集样本时会带来成本和安全问题离线强化学习作为一种基于数据驱动的强化学习范式,强调从静态样本数据集中学习策略,与环境无探索交互,为机器人、自动驾驶、健康护理等真实世界部署应用提供了可行的解决方案,是近年来的研究热点.目前,离线强化学习方法存在学习策略和行为策略之间的分布偏移挑战,针对这个挑战,通常采用策略约束或值函数正则化来限制访问数据集分布之外(Out-Of-Distribution,OOD)的动作,从而导致学习性能过于保守,阻碍了值函数网络的泛化和学习策略的性能提升.为此,本文利用不确定性估计和OOD采样来平衡值函数学习的泛化性和保守性,提出一种基于不确定性估计的离线确定型Actor-Critic方法(Offline Deterministic Actor-Critic based on UncertaintyEstimation,ODACUE).首先,针对确定型策略,给出一种Q值函数的不确定性估计算子定义,理论证明了该算子学到的Q值函数是最优Q值函数的一种悲观估计.然后,将不确定性估计算子应用于确定型Actor-Critic框架中,通过对不确定性估计算子进行凸组合构造Critic学习的目标函数.最后,D4RL基准数据集任务上的实验结果表明:相较于对比算法,ODACUE在11个不同质量等级数据集任务中的总体性能提升最低达9.56%,最高达64.92%.此外,参数分析和消融实验进一步验证了ODACUE的稳定性和泛化能力.展开更多
In this study,a novel residential virtual power plant(RVPP)scheduling method that leverages a gate recurrent unit(GRU)-integrated deep reinforcement learning(DRL)algorithm is proposed.In the proposed scheme,the GRU-in...In this study,a novel residential virtual power plant(RVPP)scheduling method that leverages a gate recurrent unit(GRU)-integrated deep reinforcement learning(DRL)algorithm is proposed.In the proposed scheme,the GRU-integrated DRL algorithm guides the RVPP to participate effectively in both the day-ahead and real-time markets,lowering the electricity purchase costs and consumption risks for end-users.The Lagrangian relaxation technique is introduced to transform the constrained Markov decision process(CMDP)into an unconstrained optimization problem,which guarantees that the constraints are strictly satisfied without determining the penalty coefficients.Furthermore,to enhance the scalability of the constrained soft actor-critic(CSAC)-based RVPP scheduling approach,a fully distributed scheduling architecture was designed to enable plug-and-play in the residential distributed energy resources(RDER).Case studies performed on the constructed RVPP scenario validated the performance of the proposed methodology in enhancing the responsiveness of the RDER to power tariffs,balancing the supply and demand of the power grid,and ensuring customer comfort.展开更多
背景:近年来深度学习技术越来越多地被运用于口腔医学领域,提高了口腔影像分析的效率及准确率,推动了口腔智能医学的迅速发展。目的:基于口腔影像,阐述深度学习在口腔疾病诊断和治疗方案决策方面的研究现状、优势与局限性,探讨深度学习...背景:近年来深度学习技术越来越多地被运用于口腔医学领域,提高了口腔影像分析的效率及准确率,推动了口腔智能医学的迅速发展。目的:基于口腔影像,阐述深度学习在口腔疾病诊断和治疗方案决策方面的研究现状、优势与局限性,探讨深度学习技术背景下口腔医学变革的新方向。方法:应用计算机检索PubMed数据库中2017年1月至2024年1月发表的深度学习在口腔医学影像领域应用的相关文献,检索词为“deep learning,artificial intelligence,stomatology,oral medical imaging”等,按入组标准筛选后最终纳入80篇文献进行综述。结果与结论:(1)经典的深度学习模型包括人工神经网络、卷积神经网络、递归神经网络和生成对抗网络等,学者们以或竞争或联合的形式运用这些模型,实现更高效的对口腔医学影像的解释。(2)在口腔医学领域,疾病诊断和治疗方案的制定在很大程度上依赖医学影像资料的判读,而深度学习技术拥有强大的图像处理能力,无论是在辅助诊断龋齿、根尖周炎、牙根纵裂、牙周病、颌骨囊肿等疾病方面,还是在辅助第三磨牙拔除术、颈淋巴结清扫术等治疗操作的术前评估方面,深度学习都能帮助临床医生提高决策的准确率与效率。(3)尽管深度学习有望成为口腔疾病诊治的重要辅助工具,但它在模型技术、安全伦理、法律监管方面仍有一定的局限性,未来的研究应侧重于证明深度学习的可推广性、稳健性和临床实用性,寻找将深度学习自动化决策支持系统应用于常规临床工作流程中的最佳方式。展开更多
文摘提出一种基于模糊RBF网络的自适应模糊A ctor-C ritic学习.采用一个模糊RBF神经网络同时逼近A ctor的动作函数和C ritic的值函数,解决状态空间泛化中易出现的“维数灾”问题.模糊RBF网络能够根据环境状态和被控对象特性的变化进行网络结构和参数的自适应学习,使得网络结构更加紧凑,整个模糊A ctor-C ritic学习具有泛化性能好、控制结构简单和学习效率高的特点.M oun ta in C ar的仿真结果验证了所提方法的有效性.
基金supported by National Natural Science Foundation of China(52222215, 52272420, 52072051)。
文摘Parking in a small parking lot within limited space poses a difficult task. It often leads to deviations between the final parking posture and the target posture. These deviations can lead to partial occupancy of adjacent parking lots, which poses a safety threat to vehicles parked in these parking lots. However, previous studies have not addressed this issue. In this paper, we aim to evaluate the impact of parking deviation of existing vehicles next to the target parking lot(PDEVNTPL) on the automatic ego vehicle(AEV) parking, in terms of safety, comfort, accuracy, and efficiency of parking. A segmented parking training framework(SPTF) based on soft actor-critic(SAC) is proposed to improve parking performance. In the proposed method, the SAC algorithm incorporates strategy entropy into the objective function, to enable the AEV to learn parking strategies based on a more comprehensive understanding of the environment. Additionally, the SPTF simplifies complex parking tasks to maintain the high performance of deep reinforcement learning(DRL). The experimental results reveal that the PDEVNTPL has a detrimental influence on the AEV parking in terms of safety, accuracy, and comfort, leading to reductions of more than 27%, 54%, and 26%respectively. However, the SAC-based SPTF effectively mitigates this impact, resulting in a considerable increase in the parking success rate from 71% to 93%. Furthermore, the heading angle deviation is significantly reduced from 2.25 degrees to 0.43degrees.
文摘Actor-Critic是一种强化学习方法,通过与环境在线试错交互收集样本来学习策略,是求解序贯感知决策问题的有效手段.但是,这种在线交互的主动学习范式在一些复杂真实环境中收集样本时会带来成本和安全问题离线强化学习作为一种基于数据驱动的强化学习范式,强调从静态样本数据集中学习策略,与环境无探索交互,为机器人、自动驾驶、健康护理等真实世界部署应用提供了可行的解决方案,是近年来的研究热点.目前,离线强化学习方法存在学习策略和行为策略之间的分布偏移挑战,针对这个挑战,通常采用策略约束或值函数正则化来限制访问数据集分布之外(Out-Of-Distribution,OOD)的动作,从而导致学习性能过于保守,阻碍了值函数网络的泛化和学习策略的性能提升.为此,本文利用不确定性估计和OOD采样来平衡值函数学习的泛化性和保守性,提出一种基于不确定性估计的离线确定型Actor-Critic方法(Offline Deterministic Actor-Critic based on UncertaintyEstimation,ODACUE).首先,针对确定型策略,给出一种Q值函数的不确定性估计算子定义,理论证明了该算子学到的Q值函数是最优Q值函数的一种悲观估计.然后,将不确定性估计算子应用于确定型Actor-Critic框架中,通过对不确定性估计算子进行凸组合构造Critic学习的目标函数.最后,D4RL基准数据集任务上的实验结果表明:相较于对比算法,ODACUE在11个不同质量等级数据集任务中的总体性能提升最低达9.56%,最高达64.92%.此外,参数分析和消融实验进一步验证了ODACUE的稳定性和泛化能力.
基金supported by the Sichuan Science and Technology Program(grant number 2022YFG0123).
文摘In this study,a novel residential virtual power plant(RVPP)scheduling method that leverages a gate recurrent unit(GRU)-integrated deep reinforcement learning(DRL)algorithm is proposed.In the proposed scheme,the GRU-integrated DRL algorithm guides the RVPP to participate effectively in both the day-ahead and real-time markets,lowering the electricity purchase costs and consumption risks for end-users.The Lagrangian relaxation technique is introduced to transform the constrained Markov decision process(CMDP)into an unconstrained optimization problem,which guarantees that the constraints are strictly satisfied without determining the penalty coefficients.Furthermore,to enhance the scalability of the constrained soft actor-critic(CSAC)-based RVPP scheduling approach,a fully distributed scheduling architecture was designed to enable plug-and-play in the residential distributed energy resources(RDER).Case studies performed on the constructed RVPP scenario validated the performance of the proposed methodology in enhancing the responsiveness of the RDER to power tariffs,balancing the supply and demand of the power grid,and ensuring customer comfort.
文摘背景:近年来深度学习技术越来越多地被运用于口腔医学领域,提高了口腔影像分析的效率及准确率,推动了口腔智能医学的迅速发展。目的:基于口腔影像,阐述深度学习在口腔疾病诊断和治疗方案决策方面的研究现状、优势与局限性,探讨深度学习技术背景下口腔医学变革的新方向。方法:应用计算机检索PubMed数据库中2017年1月至2024年1月发表的深度学习在口腔医学影像领域应用的相关文献,检索词为“deep learning,artificial intelligence,stomatology,oral medical imaging”等,按入组标准筛选后最终纳入80篇文献进行综述。结果与结论:(1)经典的深度学习模型包括人工神经网络、卷积神经网络、递归神经网络和生成对抗网络等,学者们以或竞争或联合的形式运用这些模型,实现更高效的对口腔医学影像的解释。(2)在口腔医学领域,疾病诊断和治疗方案的制定在很大程度上依赖医学影像资料的判读,而深度学习技术拥有强大的图像处理能力,无论是在辅助诊断龋齿、根尖周炎、牙根纵裂、牙周病、颌骨囊肿等疾病方面,还是在辅助第三磨牙拔除术、颈淋巴结清扫术等治疗操作的术前评估方面,深度学习都能帮助临床医生提高决策的准确率与效率。(3)尽管深度学习有望成为口腔疾病诊治的重要辅助工具,但它在模型技术、安全伦理、法律监管方面仍有一定的局限性,未来的研究应侧重于证明深度学习的可推广性、稳健性和临床实用性,寻找将深度学习自动化决策支持系统应用于常规临床工作流程中的最佳方式。