摘要
针对具有模糊加工时间和模糊交货期的作业车间调度问题,以最小化最大完工时间为目标,以近端策略优化(PPO)算法为基本优化框架,提出一种LSTM-PPO(proximal policy optimization with Long short-term memory)算法进行求解.首先,设计一种新的状态特征对调度问题进行建模,并且依据建模后的状态特征直接对工件工序进行选取,更加贴近实际环境下的调度决策过程;其次,将长短期记忆(LSTM)网络应用于PPO算法的行动者-评论者框架中,以解决传统模型在问题规模发生变化时难以扩展的问题,使智能体能够在工件、工序、机器数目发生变化时,仍然能够获得最终的调度解.在所选取的模糊作业车间调度的问题集上,通过实验验证了该算法能够取得更好的性能.
For the job shop scheduling problem with fuzzy processing time and fuzzy delivery time,this paper uses the proximal policy optimization(PPO)algorithm as the basic optimization framework with the objective of minimizing the maximum completion time.An LSTM-PPO(proximal policy optimization with long short-term memory)algorithm is proposed to solve the problem.Firstly,a new state feature is designed to model the scheduling problem,and the process is selected directly based on the modeled state feature,which is closer to the actual scheduling decision process.Them,the long short-term memory(LSTM)network is applied to the actor-commentator framework of the PPO algorithm,which solves the problem that the traditional model is difficult to scale up when the problem size changes,and enables the intelligent body to obtain the final scheduling solution even when the number of workpieces,processes,and machines changes.On the selected problem set of fuzzy job shop scheduling,it is experimentally verified that the algorithm can achieve better performance.
作者
朱家政
张宏立
王聪
李新凯
董颖超
ZHU Jia-zheng;ZHANG Hong-li;WANG Cong;LI Xin-kai;DONG Ying-chao(College of Electrical Engineering,Xinjiang University,Urumqi 830047,China)
出处
《控制与决策》
EI
CSCD
北大核心
2024年第2期595-603,共9页
Control and Decision
基金
国家自然科学基金项目(51967019,52065064)。
关键词
深度学习
强化学习
近端策略优化算法
模糊作业车间调度
deep learning
reinforcement learning
proximal policy optimization
fuzzy job shop scheduling