基于智能手机的室内定位在研究和工业领域都引起了相当大的关注。然而在复杂的定位环境中,定位的准确性和鲁棒性仍然是具有挑战性的问题。考虑到行人航位推算(PDR,pedestrian dead reckoning)算法被广泛配备在最近的智能手机上,提出了...基于智能手机的室内定位在研究和工业领域都引起了相当大的关注。然而在复杂的定位环境中,定位的准确性和鲁棒性仍然是具有挑战性的问题。考虑到行人航位推算(PDR,pedestrian dead reckoning)算法被广泛配备在最近的智能手机上,提出了一种基于双延迟深度确定性策略梯度(TD3,twin delayed deep deterministic policy gradient)的室内定位融合方法,该方法集成了Wi-Fi信息和PDR数据,将PDR的定位过程建模为马尔可夫过程并引入了智能体的连续动作空间。最后,与3个最先进的深度Q网络(DQN,deep Q network)室内定位方法进行实验。实验结果表明,该方法能够显著减少定位误差,提高定位准确性。展开更多
The popularity of quadrotor Unmanned Aerial Vehicles(UAVs)stems from their simple propulsion systems and structural design.However,their complex and nonlinear dynamic behavior presents a significant challenge for cont...The popularity of quadrotor Unmanned Aerial Vehicles(UAVs)stems from their simple propulsion systems and structural design.However,their complex and nonlinear dynamic behavior presents a significant challenge for control,necessitating sophisticated algorithms to ensure stability and accuracy in flight.Various strategies have been explored by researchers and control engineers,with learning-based methods like reinforcement learning,deep learning,and neural networks showing promise in enhancing the robustness and adaptability of quadrotor control systems.This paper investigates a Reinforcement Learning(RL)approach for both high and low-level quadrotor control systems,focusing on attitude stabilization and position tracking tasks.A novel reward function and actor-critic network structures are designed to stimulate high-order observable states,improving the agent’s understanding of the quadrotor’s dynamics and environmental constraints.To address the challenge of RL hyper-parameter tuning,a new framework is introduced that combines Simulated Annealing(SA)with a reinforcement learning algorithm,specifically Simulated Annealing-Twin Delayed Deep Deterministic Policy Gradient(SA-TD3).This approach is evaluated for path-following and stabilization tasks through comparative assessments with two commonly used control methods:Backstepping and Sliding Mode Control(SMC).While the implementation of the well-trained agents exhibited unexpected behavior during real-world testing,a reduced neural network used for altitude control was successfully implemented on a Parrot Mambo mini drone.The results showcase the potential of the proposed SA-TD3 framework for real-world applications,demonstrating improved stability and precision across various test scenarios and highlighting its feasibility for practical deployment.展开更多
在全球能源紧张的趋势下,冷热电三联供(combined cold,hot and power,CCHP)系统因能源可梯级利用和一次能源利用率高的优势日益受到重视。然而,由于影响因素复杂、多变,特别是需量电费的存在,CCHP系统以现有控制手段和实时满足用户侧供...在全球能源紧张的趋势下,冷热电三联供(combined cold,hot and power,CCHP)系统因能源可梯级利用和一次能源利用率高的优势日益受到重视。然而,由于影响因素复杂、多变,特别是需量电费的存在,CCHP系统以现有控制手段和实时满足用户侧供能需求的前提下难以以经济性目标运行。为了在考虑需求电费的条件下最大限度地降低运行成本,提出基于TD3算法的CCHP系统控制策略优化方法,对系统的各个设备进行建模,将CCHP系统运行优化问题转化为马尔卡夫决策问题,利用TD3算法求解,并进行实例验证分析。结果表明,考虑需量电费的TD3代理良好地平衡了需量电费和实时运行费用,且具有泛化性;相较于历史运行策略和不考虑需量电费的TD3代理运行策略,总运行成本分别降低了41.5%和8.6%。研究结果为减少农业供能成本、提高经济性提供了新的解决方案。展开更多
文摘基于智能手机的室内定位在研究和工业领域都引起了相当大的关注。然而在复杂的定位环境中,定位的准确性和鲁棒性仍然是具有挑战性的问题。考虑到行人航位推算(PDR,pedestrian dead reckoning)算法被广泛配备在最近的智能手机上,提出了一种基于双延迟深度确定性策略梯度(TD3,twin delayed deep deterministic policy gradient)的室内定位融合方法,该方法集成了Wi-Fi信息和PDR数据,将PDR的定位过程建模为马尔可夫过程并引入了智能体的连续动作空间。最后,与3个最先进的深度Q网络(DQN,deep Q network)室内定位方法进行实验。实验结果表明,该方法能够显著减少定位误差,提高定位准确性。
基金supported by Princess Nourah Bint Abdulrahman University Researchers Supporting Project number(PNURSP2024R135)Princess Nourah Bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘The popularity of quadrotor Unmanned Aerial Vehicles(UAVs)stems from their simple propulsion systems and structural design.However,their complex and nonlinear dynamic behavior presents a significant challenge for control,necessitating sophisticated algorithms to ensure stability and accuracy in flight.Various strategies have been explored by researchers and control engineers,with learning-based methods like reinforcement learning,deep learning,and neural networks showing promise in enhancing the robustness and adaptability of quadrotor control systems.This paper investigates a Reinforcement Learning(RL)approach for both high and low-level quadrotor control systems,focusing on attitude stabilization and position tracking tasks.A novel reward function and actor-critic network structures are designed to stimulate high-order observable states,improving the agent’s understanding of the quadrotor’s dynamics and environmental constraints.To address the challenge of RL hyper-parameter tuning,a new framework is introduced that combines Simulated Annealing(SA)with a reinforcement learning algorithm,specifically Simulated Annealing-Twin Delayed Deep Deterministic Policy Gradient(SA-TD3).This approach is evaluated for path-following and stabilization tasks through comparative assessments with two commonly used control methods:Backstepping and Sliding Mode Control(SMC).While the implementation of the well-trained agents exhibited unexpected behavior during real-world testing,a reduced neural network used for altitude control was successfully implemented on a Parrot Mambo mini drone.The results showcase the potential of the proposed SA-TD3 framework for real-world applications,demonstrating improved stability and precision across various test scenarios and highlighting its feasibility for practical deployment.
文摘在全球能源紧张的趋势下,冷热电三联供(combined cold,hot and power,CCHP)系统因能源可梯级利用和一次能源利用率高的优势日益受到重视。然而,由于影响因素复杂、多变,特别是需量电费的存在,CCHP系统以现有控制手段和实时满足用户侧供能需求的前提下难以以经济性目标运行。为了在考虑需求电费的条件下最大限度地降低运行成本,提出基于TD3算法的CCHP系统控制策略优化方法,对系统的各个设备进行建模,将CCHP系统运行优化问题转化为马尔卡夫决策问题,利用TD3算法求解,并进行实例验证分析。结果表明,考虑需量电费的TD3代理良好地平衡了需量电费和实时运行费用,且具有泛化性;相较于历史运行策略和不考虑需量电费的TD3代理运行策略,总运行成本分别降低了41.5%和8.6%。研究结果为减少农业供能成本、提高经济性提供了新的解决方案。