摘要
针对多目标柔性作业车间调度问题的复杂度高,以及求解算法对历史数据利用不足和解策略单一的问题,提出一种基于值的深度强化学习算法,通过引入最大熵方法在策略空间中获得多个较优策略。首先,将调度过程视为多阶段决策过程,提出单工序时间特征表示方法,并以此为基础设计了11个归一化的状态表征函数作为输入;其次,利用改进的带噪声的对抗网络拟合值函数;再次,在基础规则上通过复合组成24条调度规则;最后,提出了分层单步奖励以解决稀疏奖励问题。算例测试结果表明,所提算法的性能优于深度Q网络(deep Q network,DQN)和非支配排序遗传算法Ⅱ(non-dominated sorting genetic algorithmⅡ,NSGA-Ⅱ)等其他算法。
To solve the high complexity of the multi-objective flexible job shop scheduling problem,and the problems of insufficient utilization of historical data in the solution algorithm and single solution strategy,a value-based deep reinforcement learning algorithm is proposed.The maximum entropy method is introduced to obtain multiple optimal strategies in the strategy space.Firstly,the scheduling process is regarded as a multi-stage decision-making process,a single-process time feature representation method is proposed,and 11 normalized state representation functions are designed as inputs.Secondly,the improved adversarial network with noise is used to fit the value function.Thirdly,24 scheduling rules are proposed by compositing basis rules.Finally,the hierarchical single-step reward is proposed to solve the sparse reward problem.Experimental results show that the performance of the proposed algorithm is better than that of other algorithms such as deep Q network(DQN)and non-dominated sorting genetic algorithm Ⅱ(NSGA-Ⅱ).
作者
丁云明
陈荔
张昕瑞
DING Yunming;CHEN Li;ZHANG Xinrui(Business School,University of Shanghai for Science and Technology,Shanghai 200093,China)
出处
《控制工程》
CSCD
北大核心
2024年第7期1185-1194,共10页
Control Engineering of China
基金
国家自然科学基金资助项目(71301104)
教育部人文社科规划类基金资助项目(19YJA630021)。
关键词
强化学习
深度学习
柔性作业车间
多目标调度
Reinforcement learning
deep learning
flexible job shop
multi-objective