期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
Discovering Latent Variables for the Tasks With Confounders in Multi-Agent Reinforcement Learning
1
作者 Kun Jiang Wenzhang Liu +2 位作者 Yuanda Wang Lu Dong Changyin Sun 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第7期1591-1604,共14页
Efficient exploration in complex coordination tasks has been considered a challenging problem in multi-agent reinforcement learning(MARL). It is significantly more difficult for those tasks with latent variables that ... Efficient exploration in complex coordination tasks has been considered a challenging problem in multi-agent reinforcement learning(MARL). It is significantly more difficult for those tasks with latent variables that agents cannot directly observe. However, most of the existing latent variable discovery methods lack a clear representation of latent variables and an effective evaluation of the influence of latent variables on the agent. In this paper, we propose a new MARL algorithm based on the soft actor-critic method for complex continuous control tasks with confounders. It is called the multi-agent soft actor-critic with latent variable(MASAC-LV) algorithm, which uses variational inference theory to infer the compact latent variables representation space from a large amount of offline experience.Besides, we derive the counterfactual policy whose input has no latent variables and quantify the difference between the actual policy and the counterfactual policy via a distance function. This quantified difference is considered an intrinsic motivation that gives additional rewards based on how much the latent variable affects each agent. The proposed algorithm is evaluated on two collaboration tasks with confounders, and the experimental results demonstrate the effectiveness of MASAC-LV compared to other baseline algorithms. 展开更多
关键词 latent variable model maximum entropy multi-agent reinforcement learning(MARL) multi-agent system
下载PDF
Dirichlet process and its developments: a survey
2
作者 Yemao XIA Yingan LIU Jianwei GOU 《Frontiers of Mathematics in China》 SCIE CSCD 2022年第1期79-115,共37页
The core of the nonparametric/semiparametric Bayesian analysis is to relax the particular parametric assumptions on the distributions of interest to be unknown and random,and assign them a prior.Selecting a suitable p... The core of the nonparametric/semiparametric Bayesian analysis is to relax the particular parametric assumptions on the distributions of interest to be unknown and random,and assign them a prior.Selecting a suitable prior therefore is especially critical in the nonparametric Bayesian fitting.As the distribution of distribution,Dirichlet process(DP)is the most appreciated nonparametric prior due to its nice theoretical proprieties,modeling flexibility and computational feasibility.In this paper,we review and summarize some developments of DP during the past decades.Our focus is mainly concentrated upon its theoretical properties,various extensions,statistical modeling and applications to the latent variable models. 展开更多
关键词 Nonparametric Bayes Dirichlet process Polya urn prediction Sethuraman representation stick-breaking procedure Chinese restaurant rule mixture of Dirichlet process dependence Dirichlet process Markov Chains Monte Carlo blocked Gibbs sampler latent variable models
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部