摘要
针对晶圆炉管区加工过程中晶圆动态到达、不同工艺类型晶圆不相容和设备预测性维护等问题,以最小化最大完工时间和总拖期为目标,考虑炉管区组批、设备维护选择和批次排序,构建炉管区设备维护-调度联合优化的数学模型。提出基于多目标近端策略优化(MPPO)强化学习的实时调度优化方法。设计组批智能体,根据动态到达的晶圆信息,实现相同工艺类型晶圆的组批;设计设备智能体,根据设备的预维护区间,提出一种预维护区间维护调度联合优化策略,依据该优化策略维护设备并确定维护开始时间;设计排序智能体,根据批次的紧急程度和晶圆不同层尽量在同一设备加工的约束,进行批次排序加工决策。引入长短期记忆网络(LSTM)对炉管区调度信息进行记忆和预测,排序智能体进行排序时,读取组批和设备智能体的决策,并将晶圆加工结束时间反馈给组批和设备智能体,实现智能体之间的交互。根据企业实际生产情况进行案例设计,并与其他算法进行比较,验证了所提MPPO算法的有效性,其具有良好的综合调度性能。
Aiming at the problems such as dynamic wafer arrival,wafer incompatibility of different process types and predictive maintenance,a mathematical model for joint optimization of equipment maintenance scheduling in furnace tube area is constructed by considering furnace tube area group batch,equipment maintenance selection and batch sequencing.A real-time scheduling optimization method based on multi-objective proximal policy optimization(MPPO)reinforcement learning is proposed.The batch agent is designed to realize the same process type of wafer batch according to the dynamic arrival of wafer information.The equipment agent is designed and a joint optimization strategy of maintenance scheduling in the pre-maintenance interval was proposed according to the pre-maintenance interval of the equipment,and then maintain the equipment and determine the maintenance start time based on the optimization strategy.The sequencing agent is designed to make batch sequencing processing decisions based on the urgency of batch and the constraint of processing the wafers of different layers in the same equipment as far as possible.The long short-term memory(LSTM)network is introduced to memorize and predict the scheduling information of the furnace tube area.The sequencing agent reads the decision of the batch agent and equipment agent,and then feeds the end time of wafer processing to the batch agent and equipment agent so as to realize the interaction between the agents.According to the actual production situation of the enterprise,the case design and comparison with other algorithms verify the effectiveness of the proposed MPPO algorithm,which has good comprehensive scheduling performance and can optimize all targets.
作者
周亚勤
刘一枫
张朋
张洁
ZHOU Yaqin;LIU Yifeng;ZHANG Peng;ZHANG Jie(College of Mechanical Engineering,Donghua University,Shanghai,China;Artificial Intelligence Research Institute,Donghua University,Shanghai,China)
出处
《东华大学学报(自然科学版)》
CAS
北大核心
2024年第6期65-74,共10页
Journal of Donghua University(Natural Science)
基金
国家重点研发计划(2022YFB3305003)。
关键词
强化学习
炉管区
设备预维护
批处理设备
多目标优化
reinforcement learning
furnace tube
equipment pre-maintenance
batch processing equipment
multi-objective optimization