摘要
针对柔性加工系统中零件发生特征重构的工艺路线规划问题,结合异步优势演员-评论家(A3C)算法的并行、异步、响应速度快以及决策经验可复用性、可扩展性的特点,提出了基于A3C算法的特征重构工艺路线规划方法。在零件发生特征重构的背景下,基于马尔可夫决策过程定义了状态、动作空间和奖励函数。针对A3C智能体在选取机床、刀具和进刀方向时可能会陷入局部最优,提出了随机贪婪策略,以扩大解的空间、提高解的质量,且为了避免A3C智能体在零件发生特征重构时陷入大量的试错中,提出了快失败策略,以加快智能体规避特征约束的能力,提高响应速度。仿真实验证明,所提方法能有效解决零件发生特征重构的工艺路线规划问题,且相比基于遗传、蚁群和模拟退火算法的工艺路线规划方法,所提方法在零件发生特征重构时响应速度更快,解的质量更高。
The feature reconstruction process route planning method was proposed based on Asynchronous Advantageous Actor-Critic(A3C) for the process route planning problem of feature reconstruction occurring in parts in flexible machining systems by combining the parallel,asynchronous and fast response of the A3C algorithm and the reusability and scalability of the decision experience.In the context of feature reconstruction of parts,the state,action space and reward function were defined based on the Markov Decision Process(MDP).Aiming at the fact that A3C agents may fall into local optimization when selecting machines,tools and tool feed directions,a random greedy strategy was proposed to expand the solution space and improve the quality of the solution,and in order to avoid A3C agents falling into a large number of trial and error when the parts were reconstructed,a fast failure strategy was proposed to accelerate the agent′s ability to avoid feature constraints and improve the response speed.Simulation experiments show that the proposed method can effectively solve the process route planning problem of feature reconstruction of parts.Compared with the process route planning method based on genetic,ant colony and simulated annealing algorithm,the proposed method has faster response speed and higher quality of solution when feature reconstruction occurs.
作者
陶鑫钰
王艳
纪志成
TAO Xinyu;WANG Yan;JI Zhicheng(Engineering Research Center of Internet of Things Technology Applications,Ministry of Education,Jiangnan University,Wuxi 214122,China)
出处
《现代制造工程》
CSCD
北大核心
2023年第10期15-26,共12页
Modern Manufacturing Engineering
基金
国家自然科学基金项目(61973138)。
关键词
异步优势演员-评论家
特征重构
工艺路线
深度强化学习
马尔可夫决策过程
Asynchronous Advantageous Actor-Critic(A3C)
feature reconstruction
process route
deep reinforcement learning
Markov Decision Process(MDP)