This paper proposes a novel approach for physical human-robot interactions(pHRI), where a robot provides guidance forces to a user based on the user performance. This framework tunes the forces in regards to behavior ...This paper proposes a novel approach for physical human-robot interactions(pHRI), where a robot provides guidance forces to a user based on the user performance. This framework tunes the forces in regards to behavior of each user in coping with different tasks, where lower performance results in higher intervention from the robot. This personalized physical human-robot interaction(p2HRI) method incorporates adaptive modeling of the interaction between the human and the robot as well as learning from demonstration(LfD) techniques to adapt to the users' performance. This approach is based on model predictive control where the system optimizes the rendered forces by predicting the performance of the user. Moreover, continuous learning of the user behavior is added so that the models and personalized considerations are updated based on the change of user performance over time. Applying this framework to a field such as haptic guidance for skill improvement, allows a more personalized learning experience where the interaction between the robot as the intelligent tutor and the student as the user,is better adjusted based on the skill level of the individual and their gradual improvement. The results suggest that the precision of the model of the interaction is improved using this proposed method,and the addition of the considered personalized factors to a more adaptive strategy for rendering of guidance forces.展开更多
Dynamic movement primitives(DMPs)as a robust and efcient framework has been studied widely for robot learning from demonstration.Classical DMPs framework mainly focuses on the movement learning in Cartesian or joint s...Dynamic movement primitives(DMPs)as a robust and efcient framework has been studied widely for robot learning from demonstration.Classical DMPs framework mainly focuses on the movement learning in Cartesian or joint space,and can’t properly represent end-efector orientation.In this paper,we present an extended DMPs framework(EDMPs)both in Cartesian space and 2-Dimensional(2D)sphere manifold for Quaternion-based orientation learning and generalization.Gaussian mixture model and Gaussian mixture regression(GMM-GMR)are adopted as the initialization phase of EDMPs to handle multi-demonstrations and obtain their mean and covariance.Additionally,some evaluation indicators including reachability and similarity are defned to characterize the learning and generalization abilities of EDMPs.Finally,a real-world experiment was conducted with human demonstrations,the endpoint poses of human arm were recorded and successfully transferred from human to the robot.The experimental results show that the absolute errors of the Cartesian and Riemannian space skills are less than 3.5 mm and 1.0°,respectively.The Pearson’s correlation coefcients of the Cartesian and Riemannian space skills are mostly greater than 0.9.The developed EDMPs exhibits superior reachability and similarity for the multi-space skills’learning and generalization.This research proposes a fused framework with EDMPs and GMM-GMR which has sufcient capability to handle the multi-space skills in multi-demonstrations.展开更多
Learning from demonstration(LfD)is an appealing method of helping robots learn new skills.Numerous papers have presented methods of LfD with good performance in robotics.However,complicated robot tasks that need to ca...Learning from demonstration(LfD)is an appealing method of helping robots learn new skills.Numerous papers have presented methods of LfD with good performance in robotics.However,complicated robot tasks that need to carefully regulate path planning strategies remain unanswered.Contact or non-contact constraints in specific robot tasks make the path planning problem more difficult,as the interaction between the robot and the environment is time-varying.In this paper,we focus on the path planning of complex robot tasks in the domain of LfD and give a novel perspective for classifying imitation learning and inverse reinforcement learning.This classification is based on constraints and obstacle avoidance.Finally,we summarize these methods and present promising directions for robot application and LfD theory.展开更多
Autonomous planning is a significant development direction of the space manipulator,and learning from demonstrations(LfD)is a potential strategy for complex tasks in the field.However,separating control from planning ...Autonomous planning is a significant development direction of the space manipulator,and learning from demonstrations(LfD)is a potential strategy for complex tasks in the field.However,separating control from planning may cause large torque fluctuations and energy consumptions,even instability or danger in control of space manipulators,especially for the planning based on the human demonstrations.Therefore,we present an autonomous planning and control strategy for space manipulators based on LfD and focus on the dynamics uncertainty problem,a common problem of actual manipulators.The process can be divided into three stages:firstly,we reproduced the stochastic directed trajectory based on the Gaussian process-based LfD;secondly,we built the model of the stochastic dynamics of the actual manipulator with Gaussian process;thirdly,we designed an optimal controller based on the dynamics model to obtain the improved commanded torques and trajectory,and used the separation theorem to deal with stochastic characteristics during control.We evaluated the strategy with locating pre-screwed bolts experiment by Tiangong-2 manipulator system on the ground.The result showed that,compared with other strategies,the strategy proposed in this paper could significantly reduce torque fluctuations and energy consumptions,and its precision can meet the task requirements.展开更多
In this article,a robot skills learning framework is developed,which considers both motion modeling and execution.In order to enable the robot to learn skills from demonstrations,a learning method called dynamic movem...In this article,a robot skills learning framework is developed,which considers both motion modeling and execution.In order to enable the robot to learn skills from demonstrations,a learning method called dynamic movement primitives(DMPs)is introduced to model motion.A staged teaching strategy is integrated into DMPs frameworks to enhance the generality such that the complicated tasks can be also performed for multi-joint manipulators.The DMP connection method is used to make an accurate and smooth transition in position and velocity space to connect complex motion sequences.In addition,motions are categorized into different goals and durations.It is worth mentioning that an adaptive neural networks(NNs)control method is proposed to achieve highly accurate trajectory tracking and to ensure the performance of action execution,which is beneficial to the improvement of reliability of the skills learning system.The experiment test on the Baxter robot verifies the effectiveness of the proposed method.展开更多
In actor-critic reinforcement learning(RL)algorithms,function estimation errors are known to cause ineffective random exploration at the beginning of training,and lead to overestimated value estimates and suboptimal p...In actor-critic reinforcement learning(RL)algorithms,function estimation errors are known to cause ineffective random exploration at the beginning of training,and lead to overestimated value estimates and suboptimal policies.In this paper,we address the problem by executing advantage rectification with imperfect demonstrations,thus reducing the function estimation errors.Pretraining with expert demonstrations has been widely adopted to accelerate the learning process of deep reinforcement learning when simulations are expensive to obtain.However,existing methods,such as behavior cloning,often assume the demonstrations contain other information or labels with regard to performances,such as optimal assumption,which is usually incorrect and useless in the real world.In this paper,we explicitly handle imperfect demonstrations within the actor-critic RL frameworks,and propose a new method called learning from imperfect demonstrations with advantage rectification(LIDAR).LIDAR utilizes a rectified loss function to merely learn from selective demonstrations,which is derived from a minimal assumption that the demonstrating policies have better performances than our current policy.LIDAR learns from contradictions caused by estimation errors,and in turn reduces estimation errors.We apply LIDAR to three popular actor-critic algorithms,DDPG,TD3 and SAC,and experiments show that our method can observably reduce the function estimation errors,effectively leverage demonstrations far from the optimal,and outperform state-of-the-art baselines consistently in all the scenarios.展开更多
文摘This paper proposes a novel approach for physical human-robot interactions(pHRI), where a robot provides guidance forces to a user based on the user performance. This framework tunes the forces in regards to behavior of each user in coping with different tasks, where lower performance results in higher intervention from the robot. This personalized physical human-robot interaction(p2HRI) method incorporates adaptive modeling of the interaction between the human and the robot as well as learning from demonstration(LfD) techniques to adapt to the users' performance. This approach is based on model predictive control where the system optimizes the rendered forces by predicting the performance of the user. Moreover, continuous learning of the user behavior is added so that the models and personalized considerations are updated based on the change of user performance over time. Applying this framework to a field such as haptic guidance for skill improvement, allows a more personalized learning experience where the interaction between the robot as the intelligent tutor and the student as the user,is better adjusted based on the skill level of the individual and their gradual improvement. The results suggest that the precision of the model of the interaction is improved using this proposed method,and the addition of the considered personalized factors to a more adaptive strategy for rendering of guidance forces.
基金Supported by National Natural Science Foundation of China(Grant No.52175029)Key Industrial Chain Projects of Shaanxi Province(Grant No.2018ZDCXL-GY-06-05).
文摘Dynamic movement primitives(DMPs)as a robust and efcient framework has been studied widely for robot learning from demonstration.Classical DMPs framework mainly focuses on the movement learning in Cartesian or joint space,and can’t properly represent end-efector orientation.In this paper,we present an extended DMPs framework(EDMPs)both in Cartesian space and 2-Dimensional(2D)sphere manifold for Quaternion-based orientation learning and generalization.Gaussian mixture model and Gaussian mixture regression(GMM-GMR)are adopted as the initialization phase of EDMPs to handle multi-demonstrations and obtain their mean and covariance.Additionally,some evaluation indicators including reachability and similarity are defned to characterize the learning and generalization abilities of EDMPs.Finally,a real-world experiment was conducted with human demonstrations,the endpoint poses of human arm were recorded and successfully transferred from human to the robot.The experimental results show that the absolute errors of the Cartesian and Riemannian space skills are less than 3.5 mm and 1.0°,respectively.The Pearson’s correlation coefcients of the Cartesian and Riemannian space skills are mostly greater than 0.9.The developed EDMPs exhibits superior reachability and similarity for the multi-space skills’learning and generalization.This research proposes a fused framework with EDMPs and GMM-GMR which has sufcient capability to handle the multi-space skills in multi-demonstrations.
基金supported by the National Natural Science Foundation of China(Grant No.91848202)the Foundation for Innovative Research Groups of the National Natural Science Foundation of China(Grant No.51521003)。
文摘Learning from demonstration(LfD)is an appealing method of helping robots learn new skills.Numerous papers have presented methods of LfD with good performance in robotics.However,complicated robot tasks that need to carefully regulate path planning strategies remain unanswered.Contact or non-contact constraints in specific robot tasks make the path planning problem more difficult,as the interaction between the robot and the environment is time-varying.In this paper,we focus on the path planning of complex robot tasks in the domain of LfD and give a novel perspective for classifying imitation learning and inverse reinforcement learning.This classification is based on constraints and obstacle avoidance.Finally,we summarize these methods and present promising directions for robot application and LfD theory.
基金the Foundation for Innovative Research Groups of the National Natural Science Foundation of China(Grant No.51521003)the National Natural Science Foundation of China(Grant No.61803124)the Post-doctor Research Startup Foundation of Heilongjiang Province。
文摘Autonomous planning is a significant development direction of the space manipulator,and learning from demonstrations(LfD)is a potential strategy for complex tasks in the field.However,separating control from planning may cause large torque fluctuations and energy consumptions,even instability or danger in control of space manipulators,especially for the planning based on the human demonstrations.Therefore,we present an autonomous planning and control strategy for space manipulators based on LfD and focus on the dynamics uncertainty problem,a common problem of actual manipulators.The process can be divided into three stages:firstly,we reproduced the stochastic directed trajectory based on the Gaussian process-based LfD;secondly,we built the model of the stochastic dynamics of the actual manipulator with Gaussian process;thirdly,we designed an optimal controller based on the dynamics model to obtain the improved commanded torques and trajectory,and used the separation theorem to deal with stochastic characteristics during control.We evaluated the strategy with locating pre-screwed bolts experiment by Tiangong-2 manipulator system on the ground.The result showed that,compared with other strategies,the strategy proposed in this paper could significantly reduce torque fluctuations and energy consumptions,and its precision can meet the task requirements.
基金National Natural Science Foundation of China(Nos.62225304,92148204 and 62061160371)National Key Research and Development Program of China(Nos.2021ZD0114503 and 2019YFB1703600)Beijing Top Discipline for Artificial Intelligence Science and Engineering,University of Science and Technology Beijing,and the Beijing Natural Science Foundation(No.JQ20026).
文摘In this article,a robot skills learning framework is developed,which considers both motion modeling and execution.In order to enable the robot to learn skills from demonstrations,a learning method called dynamic movement primitives(DMPs)is introduced to model motion.A staged teaching strategy is integrated into DMPs frameworks to enhance the generality such that the complicated tasks can be also performed for multi-joint manipulators.The DMP connection method is used to make an accurate and smooth transition in position and velocity space to connect complex motion sequences.In addition,motions are categorized into different goals and durations.It is worth mentioning that an adaptive neural networks(NNs)control method is proposed to achieve highly accurate trajectory tracking and to ensure the performance of action execution,which is beneficial to the improvement of reliability of the skills learning system.The experiment test on the Baxter robot verifies the effectiveness of the proposed method.
基金This work was supported by the National Key R&D Plan(2016YFB0100901)the National Natural Science Foundation of China(Grant Nos.U20B2062&61673237)the Beijing Municipal Science&Technology Project(Z191100007419001).
文摘In actor-critic reinforcement learning(RL)algorithms,function estimation errors are known to cause ineffective random exploration at the beginning of training,and lead to overestimated value estimates and suboptimal policies.In this paper,we address the problem by executing advantage rectification with imperfect demonstrations,thus reducing the function estimation errors.Pretraining with expert demonstrations has been widely adopted to accelerate the learning process of deep reinforcement learning when simulations are expensive to obtain.However,existing methods,such as behavior cloning,often assume the demonstrations contain other information or labels with regard to performances,such as optimal assumption,which is usually incorrect and useless in the real world.In this paper,we explicitly handle imperfect demonstrations within the actor-critic RL frameworks,and propose a new method called learning from imperfect demonstrations with advantage rectification(LIDAR).LIDAR utilizes a rectified loss function to merely learn from selective demonstrations,which is derived from a minimal assumption that the demonstrating policies have better performances than our current policy.LIDAR learns from contradictions caused by estimation errors,and in turn reduces estimation errors.We apply LIDAR to three popular actor-critic algorithms,DDPG,TD3 and SAC,and experiments show that our method can observably reduce the function estimation errors,effectively leverage demonstrations far from the optimal,and outperform state-of-the-art baselines consistently in all the scenarios.