This paper investigated how to learn the optimal action policies in cooperative multi-agent systems if the agents’ rewards are random variables, and proposed a general two-stage learning algorithm for cooperative mul...This paper investigated how to learn the optimal action policies in cooperative multi-agent systems if the agents’ rewards are random variables, and proposed a general two-stage learning algorithm for cooperative multi-(agent) decision processes. The algorithm first calculates the averaged immediate rewards, and considers these learned rewards as the agents’ immediate action rewards to learn the optimal action policies. It is proved that the learning algorithm can find the optimal policies in stochastic environment. Extending the algorithm to stochastic Markov decision processes was also discussed.展开更多
提出了一种新的面向客户订单的制造系统模型——广义随机制造系统GRMS(Generated Random Manufacturing System)模型。通过采用奖惩机制、竞争机制和协调分布式控制,改善系统的敏捷性。文章阐述了广义随机制造系统的概念,分析了系统的...提出了一种新的面向客户订单的制造系统模型——广义随机制造系统GRMS(Generated Random Manufacturing System)模型。通过采用奖惩机制、竞争机制和协调分布式控制,改善系统的敏捷性。文章阐述了广义随机制造系统的概念,分析了系统的组成结构和基本原理,讨论了系统的运行机制及其意义。展开更多
文摘This paper investigated how to learn the optimal action policies in cooperative multi-agent systems if the agents’ rewards are random variables, and proposed a general two-stage learning algorithm for cooperative multi-(agent) decision processes. The algorithm first calculates the averaged immediate rewards, and considers these learned rewards as the agents’ immediate action rewards to learn the optimal action policies. It is proved that the learning algorithm can find the optimal policies in stochastic environment. Extending the algorithm to stochastic Markov decision processes was also discussed.
文摘提出了一种新的面向客户订单的制造系统模型——广义随机制造系统GRMS(Generated Random Manufacturing System)模型。通过采用奖惩机制、竞争机制和协调分布式控制,改善系统的敏捷性。文章阐述了广义随机制造系统的概念,分析了系统的组成结构和基本原理,讨论了系统的运行机制及其意义。