To create autonomous robots,both hardware and software are needed.If enormous progress has already been made in the field of equipment,then robot software depends on the development of artificial intelligence.This art...To create autonomous robots,both hardware and software are needed.If enormous progress has already been made in the field of equipment,then robot software depends on the development of artificial intelligence.This article proposes a solution for creating“logical”brains for autonomous robots,namely,an approach for creating an intelligent robot action planner based on Mivar expert systems.The application of this approach provides opportunities to reduce the computational complexity of solving planning problems and the requirements for the computational characteristics of hardware platforms on which intelligent planning systems are deployed.To theoretically and practically justify the expediency of using logically solving systems,in particular Mivar expert systems,to create intelligent planners,the MIPRA(Mivar-based Intelligent Planning of Robot Actions)planner was created to solve problems such as STRIPS for permutation cubes in the Blocks World domain.The planner is based on the platform for creating expert systems of the Razumator.As a result,the Mivar planner can process information about the state of the subject area based on the analysis of cause-effect relationships and an algorithm for automatically constructing logical inference(finding a solution from“Given”to“Find”).Moreover,an important feature of the MIPRA is that the system is built on the principles of a“white box”,due to which the system can explain any of its decisions and provide justification for the actions performed in the form of a retrospective of the stages of the decision-making process.When preparing a set of robot actions aimed at changing control objects,expert knowledge is used,which is the basis for the functioning algorithms of the planner.This approach makes it possible to include an expert in the process of organizing the work of the intelligent planner and use existing knowledge about the subject area.Practical experiments of this study have shown that instead of many hours and powerful multiprocessor servers,the MIPRA on a personal computer solves the planning problems with the following number of cubes:10 cubes can be rearranged in 0.028 seconds,100 cubes in 0.938 seconds,and 1000 cubes in 84.188 seconds.The results of this study can be used to reduce the computational complexity of solving tasks of planning the actions of robots,as well as their groups,multilevel heterogeneous robotic systems,and cyber-physical systems of various bases and purposes.展开更多
Offline reinforcement learning(RL)aims to learn policies entirely from passively collected datasets,making it a data‐driven decision method.One of the main challenges in offline RL is the distribution shift problem,w...Offline reinforcement learning(RL)aims to learn policies entirely from passively collected datasets,making it a data‐driven decision method.One of the main challenges in offline RL is the distribution shift problem,which causes the algorithm to visit out‐of‐distribution(OOD)samples.The distribution shift can be mitigated by constraining the divergence between the target policy and the behaviour policy.However,this method can overly constrain the target policy and impair the algorithm's performance,as it does not directly distinguish between in‐distribution and OOD samples.In addition,it is difficult to learn and represent multi‐modal behaviour policy when the datasets are collected by several different behaviour policies.To overcome these drawbacks,the au-thors address the distribution shift problem by implicit policy constraints with energy‐based models(EBMs)rather than explicitly modelling the behaviour policy.The EBM is powerful for representing complex multi‐modal distributions as well as the ability to distinguish in‐distribution samples and OODs.Experimental results show that their method significantly outperforms the explicit policy constraint method and other base-lines.In addition,the learnt energy model can be used to indicate OOD visits and alert the possible failure.展开更多
In this paper, on-road trajectory planning is solved by introducing intelligent computing budget allocation(ICBA) into a candidate-curve-based planning algorithm, namely, ordinal-optimization-based differential evolut...In this paper, on-road trajectory planning is solved by introducing intelligent computing budget allocation(ICBA) into a candidate-curve-based planning algorithm, namely, ordinal-optimization-based differential evolution(OODE). The proposed algorithm is named IOODE with ‘I' representing ICBA. OODE plans the trajectory in two parts: trajectory curve and acceleration profile. The best trajectory curve is picked from a set of candidate curves, where each curve is evaluated by solving a subproblem with the differential evolution(DE) algorithm. The more iterations DE performs, the more accurate the evaluation will become. Thus, we intelligently allocate the iterations to individual curves so as to reduce the total number of iterations performed. Meanwhile, the selected best curve is ensured to be one of the truly top curves with a high enough probability. Simulation results show that IOODE is 20% faster than OODE while maintaining the same performance in terms of solution quality. The computing budget allocation framework presented in this paper can also be used to enhance the efficiency of other candidate-curve-based planning methods.展开更多
文摘To create autonomous robots,both hardware and software are needed.If enormous progress has already been made in the field of equipment,then robot software depends on the development of artificial intelligence.This article proposes a solution for creating“logical”brains for autonomous robots,namely,an approach for creating an intelligent robot action planner based on Mivar expert systems.The application of this approach provides opportunities to reduce the computational complexity of solving planning problems and the requirements for the computational characteristics of hardware platforms on which intelligent planning systems are deployed.To theoretically and practically justify the expediency of using logically solving systems,in particular Mivar expert systems,to create intelligent planners,the MIPRA(Mivar-based Intelligent Planning of Robot Actions)planner was created to solve problems such as STRIPS for permutation cubes in the Blocks World domain.The planner is based on the platform for creating expert systems of the Razumator.As a result,the Mivar planner can process information about the state of the subject area based on the analysis of cause-effect relationships and an algorithm for automatically constructing logical inference(finding a solution from“Given”to“Find”).Moreover,an important feature of the MIPRA is that the system is built on the principles of a“white box”,due to which the system can explain any of its decisions and provide justification for the actions performed in the form of a retrospective of the stages of the decision-making process.When preparing a set of robot actions aimed at changing control objects,expert knowledge is used,which is the basis for the functioning algorithms of the planner.This approach makes it possible to include an expert in the process of organizing the work of the intelligent planner and use existing knowledge about the subject area.Practical experiments of this study have shown that instead of many hours and powerful multiprocessor servers,the MIPRA on a personal computer solves the planning problems with the following number of cubes:10 cubes can be rearranged in 0.028 seconds,100 cubes in 0.938 seconds,and 1000 cubes in 84.188 seconds.The results of this study can be used to reduce the computational complexity of solving tasks of planning the actions of robots,as well as their groups,multilevel heterogeneous robotic systems,and cyber-physical systems of various bases and purposes.
基金National Natural Science Foundation of China,Grant/Award Number:U19A2083。
文摘Offline reinforcement learning(RL)aims to learn policies entirely from passively collected datasets,making it a data‐driven decision method.One of the main challenges in offline RL is the distribution shift problem,which causes the algorithm to visit out‐of‐distribution(OOD)samples.The distribution shift can be mitigated by constraining the divergence between the target policy and the behaviour policy.However,this method can overly constrain the target policy and impair the algorithm's performance,as it does not directly distinguish between in‐distribution and OOD samples.In addition,it is difficult to learn and represent multi‐modal behaviour policy when the datasets are collected by several different behaviour policies.To overcome these drawbacks,the au-thors address the distribution shift problem by implicit policy constraints with energy‐based models(EBMs)rather than explicitly modelling the behaviour policy.The EBM is powerful for representing complex multi‐modal distributions as well as the ability to distinguish in‐distribution samples and OODs.Experimental results show that their method significantly outperforms the explicit policy constraint method and other base-lines.In addition,the learnt energy model can be used to indicate OOD visits and alert the possible failure.
基金supported by the National Natural Science Foundation of China(No.61273039)
文摘In this paper, on-road trajectory planning is solved by introducing intelligent computing budget allocation(ICBA) into a candidate-curve-based planning algorithm, namely, ordinal-optimization-based differential evolution(OODE). The proposed algorithm is named IOODE with ‘I' representing ICBA. OODE plans the trajectory in two parts: trajectory curve and acceleration profile. The best trajectory curve is picked from a set of candidate curves, where each curve is evaluated by solving a subproblem with the differential evolution(DE) algorithm. The more iterations DE performs, the more accurate the evaluation will become. Thus, we intelligently allocate the iterations to individual curves so as to reduce the total number of iterations performed. Meanwhile, the selected best curve is ensured to be one of the truly top curves with a high enough probability. Simulation results show that IOODE is 20% faster than OODE while maintaining the same performance in terms of solution quality. The computing budget allocation framework presented in this paper can also be used to enhance the efficiency of other candidate-curve-based planning methods.