摘要
在野外恶劣环境应用中,可以使用具有灵活性和便捷性的无人机(UAV),通过无线数据传输辅助携带用户任务到边缘服务器。然而,UAV飞行平台难以提供长时间的任务卸载服务,大大限制了其应用前景。本文研究了在移动边缘计算环境中,如何有效整合UAV的任务卸载和充电调度。首先,构建了一个新的应用模型,该模型协同处理UAV的任务卸载调度和自身充电需求,并在UAV辅助任务卸载应用场景中加入了若干个无线充电平台。其次,考虑了用户任务的价值和UAV的充电需求,以在时延敏感和能量约束的条件下优化UAV辅助用户设备进行任务卸载的收益。最后,采用深度强化学习算法,对深度Q网络(DQN)进行调优后形成Fixed DQN算法,以有效处理模型中的大规模状态动作搜索空间问题。本文以UAV仅作为任务载体并考虑其自主充电需求为前提,通过在一个半径为3000 m、含有11个节点的区域验证Fixed DQN算法的可行性;并在不同用户节点数量、充电节点数量及服务时间条件下,通过与蚁群算法、遗传算法和DQN算法的对比实验评估其性能。实验结果表明:本文提出的Fixed DQN算法在所有测试条件下均显著优于蚁群算法、遗传算法和DQN算法,特别是在节点数量增加和服务时间延长的情景中;此外,Fixed DQN算法相对于DQN算法的性能提升突显了深度强化学习在参数调优方面的有效性。研究结果证实了Fixed DQN算法在解决UAV任务卸载和充电调度问题中的高效性和调参策略的重要性。
In applications of harsh outdoor environments,unmanned aerial vehicles(UAVs),known for their flexibility and convenience,were utilized to assist in carrying user tasks to edge servers through wireless data transmission.However,it was found that UAV flight platforms struggled to provide long-duration task offloading services,significantly limiting their application prospects.This study investigated how to effectively integrate UAV task offloading and charging scheduling in a mobile edge computing environment.Firstly,a new application model was constructed,which cohesively managed UAV task offloading scheduling and its own charging needs,incorporating several wireless charging platforms into the UAV-assisted task offloading application scenario.These platforms enabled UAVs to autonomously recharge during task execution,providing automated charging services without the need for human intervention.UAVs independently decided whether to proceed to the nearest charging node for power replenishment based on their current power level and upcoming task offloading plans.However,opting to recharge at a charging station not only incurred additional time and energy consumption from cruising altitude to the charging station but also required consideration of the time cost during the charging process and its impact on overall task scheduling.When UAVs decided to recharge,additional time and effort were needed to descend from cruising altitude to the charging node.Secondly,the value of user tasks and UAV charging needs were considered to optimize the benefits of UAV-assisted user device task offloading under conditions sensitive to delay and energy constraints.This involved not only optimizing the UAV’s flight path and task allocation but also its charging schedule,ensuring sufficient charging and efficient operation while executing tasks.Such a cooperative scheduling strategy enabled UAVs to maximize the processing of user tasks while maintaining necessary operational energy,thereby enhancing the performance of the entire mobile edge computing system.Finally,a deep reinforcement learning algorithm was employed,and the deep Q network(DQN)was fine-tuned to form the Fixed DQN algorithm,effectively addressing the large-scale state-action search space issue within the model.This approach capably handled complex decision-making problems and facilitated effective learning and optimization across a wide state space.With the deep learning framework,the algorithm processed high-dimensional input data and made accurate offloading and charging decisions in various dynamic environments.This was significantly important for improving the efficiency and effectiveness of UAV task offloading and charging scheduling.The design of the algorithm comprehensively considered the following key aspects:Initially,the state space and action space of the algorithm were defined,ensuring that the agent could accurately perceive the environment and make effective decisions.Subsequently,the composition of the reward function was detailed,guiding the agent to progress towards the desired goal during training.Solely using the maximization of task offloading benefits as a constraint was found to prevent the agent from meeting the condition of serving each user at least once.Therefore,a method of minor learning goal constraints was proposed in the study.Specifically,the task offloading rewards accumulated by the agent in the phase of not completing minor learning goals were not directly awarded to prevent deviation from the path to achieving these goals.Afterwards,an experience replay mechanism was introduced,which improved learning efficiency and reduced correlations between samples by storing and reusing past experiences.Additionally,two asynchronously updated neural networks were employed to stabilize the learning process.Based on this,the hyperparameters of the Fixed DQN algorithm were meticulously optimized to further enhance the algorithm’s performance.Most current research was based on the assumption that UAVs possess certain task processing capabilities.However,a different assumption was adopted in the paper,where the primary role of UAVs was only to carry tasks,not directly participate in task processing.Additionally,the autonomous charging needs of UAVs were also considered.This assumption is closer to actual application scenarios,where UAVs are primarily used for data collection and transmission,rather than data processing.The limitations of UAV endurance and the need for charging during task execution were also taken into account.In the study,11 nodes were set up within a circular area with a radius of 3000 meters as a test environment to verify the feasibility of the Fixed DQN algorithm.To comprehensively evaluate the performance of the proposed Fixed DQN algorithm,extensive experiments were subsequently conducted under various conditions,including different numbers of user nodes,charging nodes,and varying lengths of service time.For comparative analysis,the experiments also included comparisons with ant colony algorithms,genetic algorithms,and DQN algorithms.In this way,the effectiveness of the Fixed DQN algorithm in different scenarios,especially in complex and dynamically changing environments,were deeply explored.The experimental results showed that under all test conditions,the Fixed DQN algorithm significantly outperforms the ant colony algorithm,genetic algorithm,and DQN algorithm,particularly in scenarios with an increased number of nodes and extended service times.Furthermore,the performance improvement of Fixed DQN over DQN highlights the effectiveness of deep reinforcement learning in parameter tuning.These findings confirms the efficiency of the Fixed DQN algorithm and the importance of parameter tuning strategies in addressing UAV task offloading and charging scheduling issues.
作者
何涵
刘鹏
赵亮
王青山
HE Han;LIU Peng;ZHAO Liang;WANG Qingshan(School of Computer Sci.and Technol.,Hangzhou Dianzi Univ.,Hangzhou 310018,China;School of Computer Sci.,Shenyang Aerospace Univ.,Shenyang 110136,China;School of Mathematics,Hefei Univ.of Technol.,Hefei 230009,China)
出处
《工程科学与技术》
EI
CAS
CSCD
北大核心
2024年第1期99-109,共11页
Advanced Engineering Sciences
基金
国家自然科学基金面上项目(62172134)。
关键词
边缘计算
无人机
任务卸载
强化学习
充电调度
edge computing
UAV
task offloading
reinforcement learning
charging scheduling