针对强化学习决策模型生成过程中,由于复杂环境和状态信息观察不完全导致经典的近端策略优化算法处理过程中面临的探索与利用效率较低、生成的策略效果较差等问题,提出了一种基于好奇心机制改进的基于最大到达次数的近端策略优化算法(pr...针对强化学习决策模型生成过程中,由于复杂环境和状态信息观察不完全导致经典的近端策略优化算法处理过程中面临的探索与利用效率较低、生成的策略效果较差等问题,提出了一种基于好奇心机制改进的基于最大到达次数的近端策略优化算法(proximal policy optimization based on maximum number of arrival&expert knowledge,MNAEK-PPO)。围绕策略空间的探索困难问题,通过构建智能体在训练过程中的探索频次矩阵,对探索频次进行处理后作为内在奖励参与到智能体的强化学习训练过程,此外还加入了专家知识辅助智能体进行决策。通过在智能化战场仿真环境中的实验确定了MNAEK-PPO中内在奖励的最佳构造方式,并进行了一系列对比实验,实验结果表明,MNAEK-PPO大幅提升了决策空间的探索效率,收敛速度和对局得分均有明显提升,为推动深度强化学习在智能战术策略生成中的应用与发展提供了新的解决思路。展开更多
The purpose of this paper is to pose a new question to speed-up mutual understanding among team members or/and group of experts when communicating over the Internet in forms of virtual collaboration, electronic brains...The purpose of this paper is to pose a new question to speed-up mutual understanding among team members or/and group of experts when communicating over the Internet in forms of virtual collaboration, electronic brainstorming, network strategic conversation, etc. We have previously proposed an approach that the convergent control mechanism based on the fundamental principles of thermodynamic and inverse problem solution method, as well as various artificial intelligence techniques, be incorporated into the communicative process. This paper shows a further development of the approach in terms of applying The Fuzzy Tychonoff Theorem along with quantum techniques provide to reach a high level of holistic discourse which is achieved not only through the application of fundamental principles of compactness of the topological space, but also utilizing quantum entanglement and complementarity principles for discourse structuring in a special way. The approach is implemented as the Responsibility Thinking System (RTS) tested in the course of finding the decisions of the real life issues.展开更多
文摘针对强化学习决策模型生成过程中,由于复杂环境和状态信息观察不完全导致经典的近端策略优化算法处理过程中面临的探索与利用效率较低、生成的策略效果较差等问题,提出了一种基于好奇心机制改进的基于最大到达次数的近端策略优化算法(proximal policy optimization based on maximum number of arrival&expert knowledge,MNAEK-PPO)。围绕策略空间的探索困难问题,通过构建智能体在训练过程中的探索频次矩阵,对探索频次进行处理后作为内在奖励参与到智能体的强化学习训练过程,此外还加入了专家知识辅助智能体进行决策。通过在智能化战场仿真环境中的实验确定了MNAEK-PPO中内在奖励的最佳构造方式,并进行了一系列对比实验,实验结果表明,MNAEK-PPO大幅提升了决策空间的探索效率,收敛速度和对局得分均有明显提升,为推动深度强化学习在智能战术策略生成中的应用与发展提供了新的解决思路。
文摘The purpose of this paper is to pose a new question to speed-up mutual understanding among team members or/and group of experts when communicating over the Internet in forms of virtual collaboration, electronic brainstorming, network strategic conversation, etc. We have previously proposed an approach that the convergent control mechanism based on the fundamental principles of thermodynamic and inverse problem solution method, as well as various artificial intelligence techniques, be incorporated into the communicative process. This paper shows a further development of the approach in terms of applying The Fuzzy Tychonoff Theorem along with quantum techniques provide to reach a high level of holistic discourse which is achieved not only through the application of fundamental principles of compactness of the topological space, but also utilizing quantum entanglement and complementarity principles for discourse structuring in a special way. The approach is implemented as the Responsibility Thinking System (RTS) tested in the course of finding the decisions of the real life issues.