With the rapid development ofmobile Internet,spatial crowdsourcing has becomemore andmore popular.Spatial crowdsourcing consists of many different types of applications,such as spatial crowd-sensing services.In terms ...With the rapid development ofmobile Internet,spatial crowdsourcing has becomemore andmore popular.Spatial crowdsourcing consists of many different types of applications,such as spatial crowd-sensing services.In terms of spatial crowd-sensing,it collects and analyzes traffic sensing data from clients like vehicles and traffic lights to construct intelligent traffic prediction models.Besides collecting sensing data,spatial crowdsourcing also includes spatial delivery services like DiDi and Uber.Appropriate task assignment and worker selection dominate the service quality for spatial crowdsourcing applications.Previous research conducted task assignments via traditional matching approaches or using simple network models.However,advanced mining methods are lacking to explore the relationship between workers,task publishers,and the spatio-temporal attributes in tasks.Therefore,in this paper,we propose a Deep Double Dueling Spatial-temporal Q Network(D3SQN)to adaptively learn the spatialtemporal relationship between task,task publishers,and workers in a dynamic environment to achieve optimal allocation.Specifically,D3SQNis revised through reinforcement learning by adding a spatial-temporal transformer that can estimate the expected state values and action advantages so as to improve the accuracy of task assignments.Extensive experiments are conducted over real data collected fromDiDi and ELM,and the simulation results verify the effectiveness of our proposed models.展开更多
With the advent of Reinforcement Learning(RL)and its continuous progress,state-of-the-art RL systems have come up for many challenging and real-world tasks.Given the scope of this area,various techniques are found in ...With the advent of Reinforcement Learning(RL)and its continuous progress,state-of-the-art RL systems have come up for many challenging and real-world tasks.Given the scope of this area,various techniques are found in the literature.One such notable technique,Multiple Deep Q-Network(DQN)based RL systems use multiple DQN-based-entities,which learn together and communicate with each other.The learning has to be distributed wisely among all entities in such a scheme and the inter-entity communication protocol has to be carefully designed.As more complex DQNs come to the fore,the overall complexity of these multi-entity systems has increased many folds leading to issues like difficulty in training,need for high resources,more training time,and difficulty in fine-tuning leading to performance issues.Taking a cue from the parallel processing found in the nature and its efficacy,we propose a lightweight ensemble based approach for solving the core RL tasks.It uses multiple binary action DQNs having shared state and reward.The benefits of the proposed approach are overall simplicity,faster convergence and better performance compared to conventional DQN based approaches.The approach can potentially be extended to any type of DQN by forming its ensemble.Conducting extensive experimentation,promising results are obtained using the proposed ensemble approach on OpenAI Gym tasks,and Atari 2600 games as compared to recent techniques.The proposed approach gives a stateof-the-art score of 500 on the Cartpole-v1 task,259.2 on the LunarLander-v2 task,and state-of-the-art results on four out of five Atari 2600 games.展开更多
The recent surge of mobile subscribers and user data traffic has accelerated the telecommunication sector towards the adoption of the fifth-generation (5G) mobile networks. Cloud radio access network (CRAN) is a promi...The recent surge of mobile subscribers and user data traffic has accelerated the telecommunication sector towards the adoption of the fifth-generation (5G) mobile networks. Cloud radio access network (CRAN) is a prominent framework in the 5G mobile network to meet the above requirements by deploying low-cost and intelligent multiple distributed antennas known as remote radio heads (RRHs). However, achieving the optimal resource allocation (RA) in CRAN using the traditional approach is still challenging due to the complex structure. In this paper, we introduce the convolutional neural network-based deep Q-network (CNN-DQN) to balance the energy consumption and guarantee the user quality of service (QoS) demand in downlink CRAN. We first formulate the Markov decision process (MDP) for energy efficiency (EE) and build up a 3-layer CNN to capture the environment feature as an input state space. We then use DQN to turn on/off the RRHs dynamically based on the user QoS demand and energy consumption in the CRAN. Finally, we solve the RA problem based on the user constraint and transmit power to guarantee the user QoS demand and maximize the EE with a minimum number of active RRHs. In the end, we conduct the simulation to compare our proposed scheme with nature DQN and the traditional approach.展开更多
In a rechargeable wireless sensor network,utilizing the unmanned aerial vehicle(UAV)as a mobile base station(BS)to charge sensors and collect data effectively prolongs the network’s lifetime.In this paper,we jointly ...In a rechargeable wireless sensor network,utilizing the unmanned aerial vehicle(UAV)as a mobile base station(BS)to charge sensors and collect data effectively prolongs the network’s lifetime.In this paper,we jointly optimize the UAV’s flight trajectory and the sensor selection and operation modes to maximize the average data traffic of all sensors within a wireless sensor network(WSN)during finite UAV’s flight time,while ensuring the energy required for each sensor by wireless power transfer(WPT).We consider a practical scenario,where the UAV has no prior knowledge of sensor locations.The UAV performs autonomous navigation based on the status information obtained within the coverage area,which is modeled as a Markov decision process(MDP).The deep Q-network(DQN)is employed to execute the navigation based on the UAV position,the battery level state,channel conditions and current data traffic of sensors within the UAV’s coverage area.Our simulation results demonstrate that the DQN algorithm significantly improves the network performance in terms of the average data traffic and trajectory design.展开更多
In the face of the increasingly severe Botnet problem on the Internet,how to effectively detect Botnet traffic in realtime has become a critical problem.Although the existing deepQnetwork(DQN)algorithminDeep reinforce...In the face of the increasingly severe Botnet problem on the Internet,how to effectively detect Botnet traffic in realtime has become a critical problem.Although the existing deepQnetwork(DQN)algorithminDeep reinforcement learning can solve the problem of real-time updating,its prediction results are always higher than the actual results.In Botnet traffic detection,although it performs well in the training set,the accuracy rate of predicting traffic is as high as%;however,in the test set,its accuracy has declined,and it is impossible to adjust its prediction strategy on time based on new data samples.However,in the new dataset,its accuracy has declined significantly.Therefore,this paper proposes a Botnet traffic detection system based on double-layer DQN(DDQN).Two Q-values are designed to adjust the model in policy and action,respectively,to achieve real-time model updates and improve the universality and robustness of the model under different data sets.Experiments show that compared with the DQN model,when using DDQN,the Q-value is not too high,and the detectionmodel has improved the accuracy and precision of Botnet traffic.Moreover,when using Botnet data sets other than the test set,the accuracy and precision of theDDQNmodel are still higher than DQN.展开更多
In the traditional well log depth matching tasks,manual adjustments are required,which means significantly labor-intensive for multiple wells,leading to low work efficiency.This paper introduces a multi-agent deep rei...In the traditional well log depth matching tasks,manual adjustments are required,which means significantly labor-intensive for multiple wells,leading to low work efficiency.This paper introduces a multi-agent deep reinforcement learning(MARL)method to automate the depth matching of multi-well logs.This method defines multiple top-down dual sliding windows based on the convolutional neural network(CNN)to extract and capture similar feature sequences on well logs,and it establishes an interaction mechanism between agents and the environment to control the depth matching process.Specifically,the agent selects an action to translate or scale the feature sequence based on the double deep Q-network(DDQN).Through the feedback of the reward signal,it evaluates the effectiveness of each action,aiming to obtain the optimal strategy and improve the accuracy of the matching task.Our experiments show that MARL can automatically perform depth matches for well-logs in multiple wells,and reduce manual intervention.In the application to the oil field,a comparative analysis of dynamic time warping(DTW),deep Q-learning network(DQN),and DDQN methods revealed that the DDQN algorithm,with its dual-network evaluation mechanism,significantly improves performance by identifying and aligning more details in the well log feature sequences,thus achieving higher depth matching accuracy.展开更多
The Internet of Medical Things(Io MT) is regarded as a critical technology for intelligent healthcare in the foreseeable 6G era. Nevertheless, due to the limited computing power capability of edge devices and task-rel...The Internet of Medical Things(Io MT) is regarded as a critical technology for intelligent healthcare in the foreseeable 6G era. Nevertheless, due to the limited computing power capability of edge devices and task-related coupling relationships, Io MT faces unprecedented challenges. Considering the associative connections among tasks, this paper proposes a computing offloading policy for multiple-user devices(UDs) considering device-to-device(D2D) communication and a multi-access edge computing(MEC)technique under the scenario of Io MT. Specifically,to minimize the total delay and energy consumption concerning the requirement of Io MT, we first analyze and model the detailed local execution, MEC execution, D2D execution, and associated tasks offloading exchange model. Consequently, the associated tasks’ offloading scheme of multi-UDs is formulated as a mixed-integer nonconvex optimization problem. Considering the advantages of deep reinforcement learning(DRL) in processing tasks related to coupling relationships, a Double DQN based associative tasks computing offloading(DDATO) algorithm is then proposed to obtain the optimal solution, which can make the best offloading decision under the condition that tasks of UDs are associative. Furthermore, to reduce the complexity of the DDATO algorithm, the cacheaided procedure is intentionally introduced before the data training process. This avoids redundant offloading and computing procedures concerning tasks that previously have already been cached by other UDs. In addition, we use a dynamic ε-greedy strategy in the action selection section of the algorithm, thus preventing the algorithm from falling into a locally optimal solution. Simulation results demonstrate that compared with other existing methods for associative task models concerning different structures in the Io MT network, the proposed algorithm can lower the total cost more effectively and efficiently while also providing a tradeoff between delay and energy consumption tolerance.展开更多
Federated learning(FL)activates distributed on-device computation techniques to model a better algorithm performance with the interaction of local model updates and global model distributions in aggregation averaging ...Federated learning(FL)activates distributed on-device computation techniques to model a better algorithm performance with the interaction of local model updates and global model distributions in aggregation averaging processes.However,in large-scale heterogeneous Internet of Things(IoT)cellular networks,massive multi-dimensional model update iterations and resource-constrained computation are challenging aspects to be tackled significantly.This paper introduces the system model of converging softwaredefined networking(SDN)and network functions virtualization(NFV)to enable device/resource abstractions and provide NFV-enabled edge FL(eFL)aggregation servers for advancing automation and controllability.Multi-agent deep Q-networks(MADQNs)target to enforce a self-learning softwarization,optimize resource allocation policies,and advocate computation offloading decisions.With gathered network conditions and resource states,the proposed agent aims to explore various actions for estimating expected longterm rewards in a particular state observation.In exploration phase,optimal actions for joint resource allocation and offloading decisions in different possible states are obtained by maximum Q-value selections.Action-based virtual network functions(VNF)forwarding graph(VNFFG)is orchestrated to map VNFs towards eFL aggregation server with sufficient communication and computation resources in NFV infrastructure(NFVI).The proposed scheme indicates deficient allocation actions,modifies the VNF backup instances,and reallocates the virtual resource for exploitation phase.Deep neural network(DNN)is used as a value function approximator,and epsilongreedy algorithm balances exploration and exploitation.The scheme primarily considers the criticalities of FL model services and congestion states to optimize long-term policy.Simulation results presented the outperformance of the proposed scheme over reference schemes in terms of Quality of Service(QoS)performance metrics,including packet drop ratio,packet drop counts,packet delivery ratio,delay,and throughput.展开更多
近年来深度强化学习作为一种高效可靠的机器学习方法被广泛应用在交通信号控制领域。目前,现有交通信号配时方法通常忽略了特殊车辆(例如救护车、消防车等)的优先通行;此外,基于传统深度强化学习的信号配时方法优化目标较为单一,导致其...近年来深度强化学习作为一种高效可靠的机器学习方法被广泛应用在交通信号控制领域。目前,现有交通信号配时方法通常忽略了特殊车辆(例如救护车、消防车等)的优先通行;此外,基于传统深度强化学习的信号配时方法优化目标较为单一,导致其在复杂交通场景中性能不佳。针对上述问题,基于Double DQN提出一种融合特殊车辆优先通行的双模式多目标信号配时方法(Dual-mode Multi-objective signal timing method based on Double DQN,DMDD),以提高不同交通场景下路口的通行效率。该方法首先基于路口的饱和状态选择信号控制模式,特殊车辆在紧急控制模式下被赋予更高的通行权重,有利于其更快通过路口;接着针对等待时长、队列长度和CO 2排放量3个指标分别设计神经网络进行奖励计算;最后利用Double DQN进行最优信号相位的选择,通过灵活切换信号相位以提升通行效率。基于SUMO的实验结果表明,DMDD与对比方法相比能有效缩短路口处特殊车辆的等待时长、队列长度和CO 2排放量,特殊车辆能够更快通过路口,有效地提高了通行效率。展开更多
基金supported in part by the Pioneer and Leading Goose R&D Program of Zhejiang Province under Grant 2022C01083 (Dr.Yu Li,https://zjnsf.kjt.zj.gov.cn/)Pioneer and Leading Goose R&D Program of Zhejiang Province under Grant 2023C01217 (Dr.Yu Li,https://zjnsf.kjt.zj.gov.cn/).
文摘With the rapid development ofmobile Internet,spatial crowdsourcing has becomemore andmore popular.Spatial crowdsourcing consists of many different types of applications,such as spatial crowd-sensing services.In terms of spatial crowd-sensing,it collects and analyzes traffic sensing data from clients like vehicles and traffic lights to construct intelligent traffic prediction models.Besides collecting sensing data,spatial crowdsourcing also includes spatial delivery services like DiDi and Uber.Appropriate task assignment and worker selection dominate the service quality for spatial crowdsourcing applications.Previous research conducted task assignments via traditional matching approaches or using simple network models.However,advanced mining methods are lacking to explore the relationship between workers,task publishers,and the spatio-temporal attributes in tasks.Therefore,in this paper,we propose a Deep Double Dueling Spatial-temporal Q Network(D3SQN)to adaptively learn the spatialtemporal relationship between task,task publishers,and workers in a dynamic environment to achieve optimal allocation.Specifically,D3SQNis revised through reinforcement learning by adding a spatial-temporal transformer that can estimate the expected state values and action advantages so as to improve the accuracy of task assignments.Extensive experiments are conducted over real data collected fromDiDi and ELM,and the simulation results verify the effectiveness of our proposed models.
文摘With the advent of Reinforcement Learning(RL)and its continuous progress,state-of-the-art RL systems have come up for many challenging and real-world tasks.Given the scope of this area,various techniques are found in the literature.One such notable technique,Multiple Deep Q-Network(DQN)based RL systems use multiple DQN-based-entities,which learn together and communicate with each other.The learning has to be distributed wisely among all entities in such a scheme and the inter-entity communication protocol has to be carefully designed.As more complex DQNs come to the fore,the overall complexity of these multi-entity systems has increased many folds leading to issues like difficulty in training,need for high resources,more training time,and difficulty in fine-tuning leading to performance issues.Taking a cue from the parallel processing found in the nature and its efficacy,we propose a lightweight ensemble based approach for solving the core RL tasks.It uses multiple binary action DQNs having shared state and reward.The benefits of the proposed approach are overall simplicity,faster convergence and better performance compared to conventional DQN based approaches.The approach can potentially be extended to any type of DQN by forming its ensemble.Conducting extensive experimentation,promising results are obtained using the proposed ensemble approach on OpenAI Gym tasks,and Atari 2600 games as compared to recent techniques.The proposed approach gives a stateof-the-art score of 500 on the Cartpole-v1 task,259.2 on the LunarLander-v2 task,and state-of-the-art results on four out of five Atari 2600 games.
基金supported by the Universiti Tunku Abdul Rahman (UTAR) Malaysia under UTARRF (IPSR/RMC/UTARRF/2021-C1/T05)
文摘The recent surge of mobile subscribers and user data traffic has accelerated the telecommunication sector towards the adoption of the fifth-generation (5G) mobile networks. Cloud radio access network (CRAN) is a prominent framework in the 5G mobile network to meet the above requirements by deploying low-cost and intelligent multiple distributed antennas known as remote radio heads (RRHs). However, achieving the optimal resource allocation (RA) in CRAN using the traditional approach is still challenging due to the complex structure. In this paper, we introduce the convolutional neural network-based deep Q-network (CNN-DQN) to balance the energy consumption and guarantee the user quality of service (QoS) demand in downlink CRAN. We first formulate the Markov decision process (MDP) for energy efficiency (EE) and build up a 3-layer CNN to capture the environment feature as an input state space. We then use DQN to turn on/off the RRHs dynamically based on the user QoS demand and energy consumption in the CRAN. Finally, we solve the RA problem based on the user constraint and transmit power to guarantee the user QoS demand and maximize the EE with a minimum number of active RRHs. In the end, we conduct the simulation to compare our proposed scheme with nature DQN and the traditional approach.
文摘In a rechargeable wireless sensor network,utilizing the unmanned aerial vehicle(UAV)as a mobile base station(BS)to charge sensors and collect data effectively prolongs the network’s lifetime.In this paper,we jointly optimize the UAV’s flight trajectory and the sensor selection and operation modes to maximize the average data traffic of all sensors within a wireless sensor network(WSN)during finite UAV’s flight time,while ensuring the energy required for each sensor by wireless power transfer(WPT).We consider a practical scenario,where the UAV has no prior knowledge of sensor locations.The UAV performs autonomous navigation based on the status information obtained within the coverage area,which is modeled as a Markov decision process(MDP).The deep Q-network(DQN)is employed to execute the navigation based on the UAV position,the battery level state,channel conditions and current data traffic of sensors within the UAV’s coverage area.Our simulation results demonstrate that the DQN algorithm significantly improves the network performance in terms of the average data traffic and trajectory design.
基金the Liaoning Province Applied Basic Research Program,2023JH2/101600038.
文摘In the face of the increasingly severe Botnet problem on the Internet,how to effectively detect Botnet traffic in realtime has become a critical problem.Although the existing deepQnetwork(DQN)algorithminDeep reinforcement learning can solve the problem of real-time updating,its prediction results are always higher than the actual results.In Botnet traffic detection,although it performs well in the training set,the accuracy rate of predicting traffic is as high as%;however,in the test set,its accuracy has declined,and it is impossible to adjust its prediction strategy on time based on new data samples.However,in the new dataset,its accuracy has declined significantly.Therefore,this paper proposes a Botnet traffic detection system based on double-layer DQN(DDQN).Two Q-values are designed to adjust the model in policy and action,respectively,to achieve real-time model updates and improve the universality and robustness of the model under different data sets.Experiments show that compared with the DQN model,when using DDQN,the Q-value is not too high,and the detectionmodel has improved the accuracy and precision of Botnet traffic.Moreover,when using Botnet data sets other than the test set,the accuracy and precision of theDDQNmodel are still higher than DQN.
文摘针对传统深度Q学习网络(deep Q-learning network,DQN)在具有动态障碍物的路径规划下,移动机器人在探索时频繁碰撞难以移动至目标点的问题,通过在探索策略和经验回放机制上进行改进,提出一种改进的DQN算法。在探索策略上,利用快速搜索随机树(rapidly-exploring random tree,RRT)算法自动生成静态先验知识来指导动作选取,替代ε-贪婪策略的随机动作,提高智能体到达目标的成功率;在经验利用上,使用K-means算法设计一种聚类经验回放机制,根据动态障碍物的位置信息进行聚类分簇,着重采样与当前智能体状态相似的经验进行回放,使智能体更有效地避免碰撞动态障碍物。二维栅格化环境下的仿真实验表明,在动态环境下,该算法可以避开静态和动态障碍物,成功移动至目标点,验证了该算法在应对动态避障路径规划的可行性。
基金Supported by the China National Petroleum Corporation Limited-China University of Petroleum(Beijing)Strategic Cooperation Science and Technology Project(ZLZX2020-03).
文摘In the traditional well log depth matching tasks,manual adjustments are required,which means significantly labor-intensive for multiple wells,leading to low work efficiency.This paper introduces a multi-agent deep reinforcement learning(MARL)method to automate the depth matching of multi-well logs.This method defines multiple top-down dual sliding windows based on the convolutional neural network(CNN)to extract and capture similar feature sequences on well logs,and it establishes an interaction mechanism between agents and the environment to control the depth matching process.Specifically,the agent selects an action to translate or scale the feature sequence based on the double deep Q-network(DDQN).Through the feedback of the reward signal,it evaluates the effectiveness of each action,aiming to obtain the optimal strategy and improve the accuracy of the matching task.Our experiments show that MARL can automatically perform depth matches for well-logs in multiple wells,and reduce manual intervention.In the application to the oil field,a comparative analysis of dynamic time warping(DTW),deep Q-learning network(DQN),and DDQN methods revealed that the DDQN algorithm,with its dual-network evaluation mechanism,significantly improves performance by identifying and aligning more details in the well log feature sequences,thus achieving higher depth matching accuracy.
基金supported by National Natural Science Foundation of China(Grant No.62071377,62101442,62201456)Natural Science Foundation of Shaanxi Province(Grant No.2023-YBGY-036,2022JQ-687)The Graduate Student Innovation Foundation Project of Xi’an University of Posts and Telecommunications under Grant CXJJDL2022003.
文摘The Internet of Medical Things(Io MT) is regarded as a critical technology for intelligent healthcare in the foreseeable 6G era. Nevertheless, due to the limited computing power capability of edge devices and task-related coupling relationships, Io MT faces unprecedented challenges. Considering the associative connections among tasks, this paper proposes a computing offloading policy for multiple-user devices(UDs) considering device-to-device(D2D) communication and a multi-access edge computing(MEC)technique under the scenario of Io MT. Specifically,to minimize the total delay and energy consumption concerning the requirement of Io MT, we first analyze and model the detailed local execution, MEC execution, D2D execution, and associated tasks offloading exchange model. Consequently, the associated tasks’ offloading scheme of multi-UDs is formulated as a mixed-integer nonconvex optimization problem. Considering the advantages of deep reinforcement learning(DRL) in processing tasks related to coupling relationships, a Double DQN based associative tasks computing offloading(DDATO) algorithm is then proposed to obtain the optimal solution, which can make the best offloading decision under the condition that tasks of UDs are associative. Furthermore, to reduce the complexity of the DDATO algorithm, the cacheaided procedure is intentionally introduced before the data training process. This avoids redundant offloading and computing procedures concerning tasks that previously have already been cached by other UDs. In addition, we use a dynamic ε-greedy strategy in the action selection section of the algorithm, thus preventing the algorithm from falling into a locally optimal solution. Simulation results demonstrate that compared with other existing methods for associative task models concerning different structures in the Io MT network, the proposed algorithm can lower the total cost more effectively and efficiently while also providing a tradeoff between delay and energy consumption tolerance.
基金This work was funded by BK21 FOUR(Fostering Outstanding Universities for Research)(No.5199990914048)this research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(NRF-2020R1I1A3066543)In addition,this work was supported by the Soonchunhyang University Research Fund.
文摘Federated learning(FL)activates distributed on-device computation techniques to model a better algorithm performance with the interaction of local model updates and global model distributions in aggregation averaging processes.However,in large-scale heterogeneous Internet of Things(IoT)cellular networks,massive multi-dimensional model update iterations and resource-constrained computation are challenging aspects to be tackled significantly.This paper introduces the system model of converging softwaredefined networking(SDN)and network functions virtualization(NFV)to enable device/resource abstractions and provide NFV-enabled edge FL(eFL)aggregation servers for advancing automation and controllability.Multi-agent deep Q-networks(MADQNs)target to enforce a self-learning softwarization,optimize resource allocation policies,and advocate computation offloading decisions.With gathered network conditions and resource states,the proposed agent aims to explore various actions for estimating expected longterm rewards in a particular state observation.In exploration phase,optimal actions for joint resource allocation and offloading decisions in different possible states are obtained by maximum Q-value selections.Action-based virtual network functions(VNF)forwarding graph(VNFFG)is orchestrated to map VNFs towards eFL aggregation server with sufficient communication and computation resources in NFV infrastructure(NFVI).The proposed scheme indicates deficient allocation actions,modifies the VNF backup instances,and reallocates the virtual resource for exploitation phase.Deep neural network(DNN)is used as a value function approximator,and epsilongreedy algorithm balances exploration and exploitation.The scheme primarily considers the criticalities of FL model services and congestion states to optimize long-term policy.Simulation results presented the outperformance of the proposed scheme over reference schemes in terms of Quality of Service(QoS)performance metrics,including packet drop ratio,packet drop counts,packet delivery ratio,delay,and throughput.
文摘近年来深度强化学习作为一种高效可靠的机器学习方法被广泛应用在交通信号控制领域。目前,现有交通信号配时方法通常忽略了特殊车辆(例如救护车、消防车等)的优先通行;此外,基于传统深度强化学习的信号配时方法优化目标较为单一,导致其在复杂交通场景中性能不佳。针对上述问题,基于Double DQN提出一种融合特殊车辆优先通行的双模式多目标信号配时方法(Dual-mode Multi-objective signal timing method based on Double DQN,DMDD),以提高不同交通场景下路口的通行效率。该方法首先基于路口的饱和状态选择信号控制模式,特殊车辆在紧急控制模式下被赋予更高的通行权重,有利于其更快通过路口;接着针对等待时长、队列长度和CO 2排放量3个指标分别设计神经网络进行奖励计算;最后利用Double DQN进行最优信号相位的选择,通过灵活切换信号相位以提升通行效率。基于SUMO的实验结果表明,DMDD与对比方法相比能有效缩短路口处特殊车辆的等待时长、队列长度和CO 2排放量,特殊车辆能够更快通过路口,有效地提高了通行效率。