The concept of reward is fundamental in reinforcement learning with a wide range of applications in natural and social sciences.Seeking an interpretable reward for decision-making that largely shapes the system's ...The concept of reward is fundamental in reinforcement learning with a wide range of applications in natural and social sciences.Seeking an interpretable reward for decision-making that largely shapes the system's behavior has always been a challenge in reinforcement learning.In this work,we explore a discrete-time reward for reinforcement learning in continuous time and action spaces that represent many phenomena captured by applying physical laws.We find that the discrete-time reward leads to the extraction of the unique continuous-time decision law and improved computational efficiency by dropping the integrator operator that appears in classical results with integral rewards.We apply this finding to solve output-feedback design problems in power systems.The results reveal that our approach removes an intermediate stage of identifying dynamical models.Our work suggests that the discrete-time reward is efficient in search of the desired decision law,which provides a computational tool to understand and modify the behavior of large-scale engineering systems using the optimal learned decision.展开更多
In this paper,sine trigonometry operational laws(ST-OLs)have been extended to neutrosophic sets(NSs)and the operations and functionality of these laws are studied.Then,extending these ST-OLs to complex neutrosophic se...In this paper,sine trigonometry operational laws(ST-OLs)have been extended to neutrosophic sets(NSs)and the operations and functionality of these laws are studied.Then,extending these ST-OLs to complex neutrosophic sets(CNSs)forms the core of thiswork.Some of themathematical properties are proved based on ST-OLs.Fundamental operations and the distance measures between complex neutrosophic numbers(CNNs)based on the ST-OLs are discussed with numerical illustrations.Further the arithmetic and geometric aggregation operators are established and their properties are verified with numerical data.The general properties of the developed sine trigonometry weighted averaging/geometric aggregation operators for CNNs(ST-WAAO-CNN&ST-WGAO-CNN)are proved.A decision making technique based on these operators has been developed with the help of unsupervised criteria weighting approach called Entropy-ST-OLs-CNDM(complex neutrosophic decision making)method.A case study for material selection has been chosen to demonstrate the ST-OLs of CNDM method.To check the validity of the proposed method,entropy based complex neutrosophic CODAS approach with ST-OLs has been executed numerically and a comparative analysis with the discussion of their outcomes has been conducted.The proposed approach proves to be salient and effective for decision making with complex information.展开更多
基金supported by the Guangdong Basic and Applied Basic Research Foundation(2024A1515011936)the National Natural Science Foundation of China(62320106008)
文摘The concept of reward is fundamental in reinforcement learning with a wide range of applications in natural and social sciences.Seeking an interpretable reward for decision-making that largely shapes the system's behavior has always been a challenge in reinforcement learning.In this work,we explore a discrete-time reward for reinforcement learning in continuous time and action spaces that represent many phenomena captured by applying physical laws.We find that the discrete-time reward leads to the extraction of the unique continuous-time decision law and improved computational efficiency by dropping the integrator operator that appears in classical results with integral rewards.We apply this finding to solve output-feedback design problems in power systems.The results reveal that our approach removes an intermediate stage of identifying dynamical models.Our work suggests that the discrete-time reward is efficient in search of the desired decision law,which provides a computational tool to understand and modify the behavior of large-scale engineering systems using the optimal learned decision.
基金the Rajamangala University of Technology Suvarnabhumi.
文摘In this paper,sine trigonometry operational laws(ST-OLs)have been extended to neutrosophic sets(NSs)and the operations and functionality of these laws are studied.Then,extending these ST-OLs to complex neutrosophic sets(CNSs)forms the core of thiswork.Some of themathematical properties are proved based on ST-OLs.Fundamental operations and the distance measures between complex neutrosophic numbers(CNNs)based on the ST-OLs are discussed with numerical illustrations.Further the arithmetic and geometric aggregation operators are established and their properties are verified with numerical data.The general properties of the developed sine trigonometry weighted averaging/geometric aggregation operators for CNNs(ST-WAAO-CNN&ST-WGAO-CNN)are proved.A decision making technique based on these operators has been developed with the help of unsupervised criteria weighting approach called Entropy-ST-OLs-CNDM(complex neutrosophic decision making)method.A case study for material selection has been chosen to demonstrate the ST-OLs of CNDM method.To check the validity of the proposed method,entropy based complex neutrosophic CODAS approach with ST-OLs has been executed numerically and a comparative analysis with the discussion of their outcomes has been conducted.The proposed approach proves to be salient and effective for decision making with complex information.