Deep Q Network(DQN)is an efficient model-free optimization method,and has the potential to be used in building cooling water systems.However,due to the high dimension of actions,this method requires a complex neural n...Deep Q Network(DQN)is an efficient model-free optimization method,and has the potential to be used in building cooling water systems.However,due to the high dimension of actions,this method requires a complex neural network.Therefore,both the required number of training samples and the length of convergence period are barriers for real application.Furthermore,penalty function based exploration may lead to unsafe actions,causing the application of this optimization method even more difficult.To solve these problems,an approach to limit the action space within a safe area is proposed in this paper.First of all,the action space for cooling towers and pumps are separated into two sub-regions.Secondly,for each type of equipment,the action space is further divided into safe and unsafe regions.As a result,the convergence speed is significantly improved.Compared with the traditional DQN method in a simulation environment validated by real data,the proposed method is able to save the convergence time by 1 episode(one cooling season).The results in this paper suggest that,the proposed DQN method can achieve a much quicker learning speed without any undesired consequences,and therefore is more suitable to be used in projects without pre-learning stage.展开更多
文摘Deep Q Network(DQN)is an efficient model-free optimization method,and has the potential to be used in building cooling water systems.However,due to the high dimension of actions,this method requires a complex neural network.Therefore,both the required number of training samples and the length of convergence period are barriers for real application.Furthermore,penalty function based exploration may lead to unsafe actions,causing the application of this optimization method even more difficult.To solve these problems,an approach to limit the action space within a safe area is proposed in this paper.First of all,the action space for cooling towers and pumps are separated into two sub-regions.Secondly,for each type of equipment,the action space is further divided into safe and unsafe regions.As a result,the convergence speed is significantly improved.Compared with the traditional DQN method in a simulation environment validated by real data,the proposed method is able to save the convergence time by 1 episode(one cooling season).The results in this paper suggest that,the proposed DQN method can achieve a much quicker learning speed without any undesired consequences,and therefore is more suitable to be used in projects without pre-learning stage.