In the domain of autonomous industrial manipulators,precise positioning and appropriate posture selection in path planning are pivotal for tasks involving obstacle avoidance,such as handling,heat sealing,and stacking....In the domain of autonomous industrial manipulators,precise positioning and appropriate posture selection in path planning are pivotal for tasks involving obstacle avoidance,such as handling,heat sealing,and stacking.While Multi-Degree-of-Freedom(MDOF)manipulators offer kinematic redundancy,aiding in the derivation of optimal inverse kinematic solutions to meet position and posture requisites,their path planning entails intricate multiobjective optimization,encompassing path,posture,and joint motion optimization.Achieving satisfactory results in practical scenarios remains challenging.In response,this study introduces a novel Reverse Path Planning(RPP)methodology tailored for industrial manipulators.The approach commences by conceptualizing the manipulator’s end-effector as an agent within a reinforcement learning(RL)framework,wherein the state space,action set,and reward function are precisely defined to expedite the search for an initial collision-free path.To enhance convergence speed,the Q-learning algorithm in RL is augmented with Dyna-Q.Additionally,we formulate the cylindrical bounding box of the manipulator based on its Denavit-Hartenberg(DH)parameters and propose a swift collision detection technique.Furthermore,the motion performance of the end-effector is refined through a bidirectional search,and joint weighting coefficients are introduced to mitigate motion in high-power joints.The efficacy of the proposed RPP methodology is rigorously examined through extensive simulations conducted on a six-degree-of-freedom(6-DOF)manipulator encountering two distinct obstacle configurations and target positions.Experimental results substantiate that the RPP method adeptly orchestrates the computation of the shortest collision-free path while adhering to specific posture constraints at the target point.Moreover,itminimizes both posture angle deviations and joint motion,showcasing its prowess in enhancing the operational performance of MDOF industrial manipulators.展开更多
In any classical value-based reinforcement learning method,an agent,despite of its continuous interactions with the environment,is yet unable to quickly generate a complete and independent description of the entire en...In any classical value-based reinforcement learning method,an agent,despite of its continuous interactions with the environment,is yet unable to quickly generate a complete and independent description of the entire environment,leaving the learning method to struggle with a difficult dilemma of choosing between the two tasks,namely exploration and exploitation.This problem becomes more pronounced when the agent has to deal with a dynamic environment,of which the configuration and/or parameters are constantly changing.In this paper,this problem is approached by first mapping a reinforcement learning scheme to a directed graph,and the set that contains all the states already explored shall continue to be exploited in the context of such a graph.We have proved that the two tasks of exploration and exploitation eventually converge in the decision-making process,and thus,there is no need to face the exploration vs.exploitation tradeoff as all the existing reinforcement learning methods do.Rather this observation indicates that a reinforcement learning scheme is essentially the same as searching for the shortest path in a dynamic environment,which is readily tackled by a modified Floyd-Warshall algorithm as proposed in the paper.The experimental results have confirmed that the proposed graph-based reinforcement learning algorithm has significantly higher performance than both standard Q-learning algorithm and improved Q-learning algorithm in solving mazes,rendering it an algorithm of choice in applications involving dynamic environments.展开更多
Directly applying the B-spline interpolation function to process plate cams in a computer numerical control(CNC)system may produce verbose tool-path codes and unsmooth trajectories.This paper is devoted to addressing ...Directly applying the B-spline interpolation function to process plate cams in a computer numerical control(CNC)system may produce verbose tool-path codes and unsmooth trajectories.This paper is devoted to addressing the problem of B-splinefitting for cam pitch curves.Considering that the B-spline curve needs to meet the motion law of the follower to approximate the pitch curve,we use the radial error to quantify the effects of thefitting B-spline curve and the pitch curve.The problem thus boils down to solving a difficult global optimization problem tofind the numbers and positions of the control points or data points of the B-spline curve such that the cumulative radial error between thefitting curve and the original curve is minimized,and this problem is attempted in this paper with a double deep Q-network(DDQN)reinforcement learning(RL)algorithm with data points traceability.Specifically,the RL envir-onment,actions set and current states set are designed to facilitate the search of the data points,along with the design of the reward function and the initialization of the neural network.The experimental results show that when the angle division value of the actions set isfixed,the proposed algorithm can maximize the number of data points of the B-spline curve,and accurately place these data points to the right positions,with the minimum average of radial errors.Our work establishes the theoretical foundation for studying splinefitting using the RL method.展开更多
基金supported by the National Natural Science Foundation of China under Grant No.62001199Fujian Province Nature Science Foundation under Grant No.2023J01925.
文摘In the domain of autonomous industrial manipulators,precise positioning and appropriate posture selection in path planning are pivotal for tasks involving obstacle avoidance,such as handling,heat sealing,and stacking.While Multi-Degree-of-Freedom(MDOF)manipulators offer kinematic redundancy,aiding in the derivation of optimal inverse kinematic solutions to meet position and posture requisites,their path planning entails intricate multiobjective optimization,encompassing path,posture,and joint motion optimization.Achieving satisfactory results in practical scenarios remains challenging.In response,this study introduces a novel Reverse Path Planning(RPP)methodology tailored for industrial manipulators.The approach commences by conceptualizing the manipulator’s end-effector as an agent within a reinforcement learning(RL)framework,wherein the state space,action set,and reward function are precisely defined to expedite the search for an initial collision-free path.To enhance convergence speed,the Q-learning algorithm in RL is augmented with Dyna-Q.Additionally,we formulate the cylindrical bounding box of the manipulator based on its Denavit-Hartenberg(DH)parameters and propose a swift collision detection technique.Furthermore,the motion performance of the end-effector is refined through a bidirectional search,and joint weighting coefficients are introduced to mitigate motion in high-power joints.The efficacy of the proposed RPP methodology is rigorously examined through extensive simulations conducted on a six-degree-of-freedom(6-DOF)manipulator encountering two distinct obstacle configurations and target positions.Experimental results substantiate that the RPP method adeptly orchestrates the computation of the shortest collision-free path while adhering to specific posture constraints at the target point.Moreover,itminimizes both posture angle deviations and joint motion,showcasing its prowess in enhancing the operational performance of MDOF industrial manipulators.
基金This research work is supported by Fujian Province Nature Science Foundation under Grant No.2018J01553.
文摘In any classical value-based reinforcement learning method,an agent,despite of its continuous interactions with the environment,is yet unable to quickly generate a complete and independent description of the entire environment,leaving the learning method to struggle with a difficult dilemma of choosing between the two tasks,namely exploration and exploitation.This problem becomes more pronounced when the agent has to deal with a dynamic environment,of which the configuration and/or parameters are constantly changing.In this paper,this problem is approached by first mapping a reinforcement learning scheme to a directed graph,and the set that contains all the states already explored shall continue to be exploited in the context of such a graph.We have proved that the two tasks of exploration and exploitation eventually converge in the decision-making process,and thus,there is no need to face the exploration vs.exploitation tradeoff as all the existing reinforcement learning methods do.Rather this observation indicates that a reinforcement learning scheme is essentially the same as searching for the shortest path in a dynamic environment,which is readily tackled by a modified Floyd-Warshall algorithm as proposed in the paper.The experimental results have confirmed that the proposed graph-based reinforcement learning algorithm has significantly higher performance than both standard Q-learning algorithm and improved Q-learning algorithm in solving mazes,rendering it an algorithm of choice in applications involving dynamic environments.
基金supported by Fujian Province Nature Science Foundation under Grant No.2018J01553.
文摘Directly applying the B-spline interpolation function to process plate cams in a computer numerical control(CNC)system may produce verbose tool-path codes and unsmooth trajectories.This paper is devoted to addressing the problem of B-splinefitting for cam pitch curves.Considering that the B-spline curve needs to meet the motion law of the follower to approximate the pitch curve,we use the radial error to quantify the effects of thefitting B-spline curve and the pitch curve.The problem thus boils down to solving a difficult global optimization problem tofind the numbers and positions of the control points or data points of the B-spline curve such that the cumulative radial error between thefitting curve and the original curve is minimized,and this problem is attempted in this paper with a double deep Q-network(DDQN)reinforcement learning(RL)algorithm with data points traceability.Specifically,the RL envir-onment,actions set and current states set are designed to facilitate the search of the data points,along with the design of the reward function and the initialization of the neural network.The experimental results show that when the angle division value of the actions set isfixed,the proposed algorithm can maximize the number of data points of the B-spline curve,and accurately place these data points to the right positions,with the minimum average of radial errors.Our work establishes the theoretical foundation for studying splinefitting using the RL method.