摘要
In this paper,multi-unmanned aerial vehicle(multi-UAV)and multi-user system are studied,where UAVs are served as aerial base stations(BS)for ground users in the same frequency band without knowing the locations and channel parameters for the users.We aim to maximize the total throughput for all the users and meet the fairness requirement by optimizing the UAVs’trajectories and transmission power in a centralized way.This problem is non-convex and very difficult to solve,as the locations of the user are unknown to the UAVs.We propose a deep reinforcement learning(DRL)-based solution,i.e.,soft actor-critic(SAC)to address it via modeling the problem as a Markov decision process(MDP).We carefully design the reward function that combines sparse with non-sparse reward to achieve the balance between exploitation and exploration.The simulation results show that the proposed SAC has a very good performance in terms of both training and testing.
基金
National Nat-ural Science Foundation of China(62101161)
Shenzhen Basic Research Program(20200811192821001)
Shenzhen Basic Research Program(JCYJ20190808122409660)
Guangdong Basic Research Program(2019A1515110358)
Guangdong Basic Research Program(2021A1515012097)
Guangdong Basic Research Program(2020ZDZX1037)
Guangdong Basic Research Program(2020ZDZX1021)
open research fund of National Mobile Communications Research Laboratory,Southeast University(2021D16)
open research fund of National Mobile Communications Research Laboratory,Southeast University(2022D02)。