To explore the green development of automobile enterprises and promote the achievement of the“dual carbon”target,based on the bounded rationality assumptions,this study constructed a tripartite evolutionary game mod...To explore the green development of automobile enterprises and promote the achievement of the“dual carbon”target,based on the bounded rationality assumptions,this study constructed a tripartite evolutionary game model of gov-ernment,commercial banks,and automobile enterprises;introduced a dynamic reward and punishment mechanism;and analyzed the development process of the three parties’strategic behavior under the static and dynamic reward and punish-ment mechanism.Vensim PLE was used for numerical simulation analysis.Our results indicate that the system could not reach a stable state under the static reward and punishment mechanism.A dynamic reward and punishment mechanism can effectively improve the system stability and better fit real situations.Under the dynamic reward and punishment mechan-ism,an increase in the initial probabilities of the three parties can promote the system stability,and the government can im-plement effective supervision by adjusting the upper limit of the reward and punishment intensity.Finally,the implementa-tion of green credit by commercial banks plays a significant role in promoting the green development of automobile enter-prises.展开更多
In public goods games, punishments and rewards have been shown to be effective mechanisms for maintaining individualcooperation. However, punishments and rewards are costly to incentivize cooperation. Therefore, the g...In public goods games, punishments and rewards have been shown to be effective mechanisms for maintaining individualcooperation. However, punishments and rewards are costly to incentivize cooperation. Therefore, the generation ofcostly penalties and rewards has been a complex problem in promoting the development of cooperation. In real society,specialized institutions exist to punish evil people or reward good people by collecting taxes. We propose a strong altruisticpunishment or reward strategy in the public goods game through this phenomenon. Through theoretical analysis and numericalcalculation, we can get that tax-based strong altruistic punishment (reward) has more evolutionary advantages thantraditional strong altruistic punishment (reward) in maintaining cooperation and tax-based strong altruistic reward leads toa higher level of cooperation than tax-based strong altruistic punishment.展开更多
By integrating deep neural networks with reinforcement learning,the Double Deep Q Network(DDQN)algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning...By integrating deep neural networks with reinforcement learning,the Double Deep Q Network(DDQN)algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning of mobile robots.However,the traditional DDQN algorithm suffers from sparse rewards and inefficient utilization of high-quality data.Targeting those problems,an improved DDQN algorithm based on average Q-value estimation and reward redistribution was proposed.First,to enhance the precision of the target Q-value,the average of multiple previously learned Q-values from the target Q network is used to replace the single Q-value from the current target Q network.Next,a reward redistribution mechanism is designed to overcome the sparse reward problem by adjusting the final reward of each action using the round reward from trajectory information.Additionally,a reward-prioritized experience selection method is introduced,which ranks experience samples according to reward values to ensure frequent utilization of high-quality data.Finally,simulation experiments are conducted to verify the effectiveness of the proposed algorithm in fixed-position scenario and random environments.The experimental results show that compared to the traditional DDQN algorithm,the proposed algorithm achieves shorter average running time,higher average return and fewer average steps.The performance of the proposed algorithm is improved by 11.43%in the fixed scenario and 8.33%in random environments.It not only plans economic and safe paths but also significantly improves efficiency and generalization in path planning,making it suitable for widespread application in autonomous navigation and industrial automation.展开更多
This essay aims at illustrate the important role, of reward and punishment in education from a psychological viewpoint. According to Stimulus and Response theory, reward and punishment are now commonly used by teacher...This essay aims at illustrate the important role, of reward and punishment in education from a psychological viewpoint. According to Stimulus and Response theory, reward and punishment are now commonly used by teachers to encourage both congnitive activities and appropriate behaviour in classroom . Either of them can be used to encourage or supervise the students in learning, and rewarding is fawoured. However reward mechanism must be used properly and under control. It should 't be overused. Also, there is a place for punishment in education because errors need to be pointed out and antisocial behaviour should be corrected. It can be applied only when the intensity, duration and timing are carefully considered. In a word , reward system . is undoubtedly to have positive effect while punishment is proved to cause unpredictable result. Those specific informations are mentioned in the essay that follows.展开更多
To study the incentive mechanisms of cooperation, we propose a preference rewarding mechanism in the spatial prisoner’s dilemma game, which simultaneously considers reputational preference, other-regarding preference...To study the incentive mechanisms of cooperation, we propose a preference rewarding mechanism in the spatial prisoner’s dilemma game, which simultaneously considers reputational preference, other-regarding preference and the dynamic adjustment of vertex weight. The vertex weight of a player is adaptively adjusted according to the comparison result of his own reputation and the average reputation value of his immediate neighbors. Players are inclined to pay a personal cost to reward the cooperative neighbor with the greatest vertex weight. The vertex weight of a player is proportional to the preference rewards he can obtain from direct neighbors. We find that the preference rewarding mechanism significantly facilitates the evolution of cooperation, and the dynamic adjustment of vertex weight has powerful effect on the emergence of cooperative behavior. To validate multiple effects, strategy distribution and the average payoff and fitness of players are discussed in a microcosmic view.展开更多
为有效提高碳排放配额分配的合理性,并且避免年度结算时碳排放量超标导致环境污染加剧问题,提出基于奖惩因子的季节性碳交易机制,以园区综合能源系统(park integrated energy system,PIES)为对象进行低碳经济调度。首先,构建包含能量层...为有效提高碳排放配额分配的合理性,并且避免年度结算时碳排放量超标导致环境污染加剧问题,提出基于奖惩因子的季节性碳交易机制,以园区综合能源系统(park integrated energy system,PIES)为对象进行低碳经济调度。首先,构建包含能量层–碳流层–管理层的综合能源系统(integrated energy system,IES)运行框架,建立电气热多能流供需动态一致性模型;其次,分析系统内“日–季节–年度”碳排放特性,打破传统应用指标法的配额分配方法,采用灰色关联分析法建立碳排放配额分配模型,并基于奖惩阶梯碳价制定季节性碳交易机制;最后,以系统内全寿命周期运行成本及碳交易成本最小为目标,对执行季节性碳交易机制的PIES进行低碳经济调度,分析长时间尺度下季节性储能参与调度的减碳量。搭建IEEE 33节点电网5节点气网7节点热网的PIES,并基于多场景进行算例分析,验证此调度方法能够实现零碳经济运行,保证系统供能可靠性,为建立零碳园区奠定理论基础。展开更多
工业数据由于技术故障和人为因素通常导致数据异常,现有基于约束的方法因约束阈值设置的过于宽松或严格会导致修复错误,基于统计的方法因平滑修复机制导致对时间步长较远的异常值修复准确度较低.针对上述问题,提出了基于奖励机制的最小...工业数据由于技术故障和人为因素通常导致数据异常,现有基于约束的方法因约束阈值设置的过于宽松或严格会导致修复错误,基于统计的方法因平滑修复机制导致对时间步长较远的异常值修复准确度较低.针对上述问题,提出了基于奖励机制的最小迭代修复和改进WGAN混合模型的时序数据修复方法.首先,在预处理阶段,保留异常数据,进行信息标注等处理,从而充分挖掘异常值与真实值之间的特征约束.其次,在噪声模块提出了近邻参数裁剪规则,用于修正最小迭代修复公式生成的噪声向量.将其传递至模拟分布模块的生成器中,同时设计了一个动态时间注意力网络层,用于提取时序特征权重并与门控循环单元串联组合捕捉不同步长的特征依赖,并引入递归多步预测原理共同提升模型的表达能力;在判别器中设计了Abnormal and Truth奖励机制和Weighted Mean Square Error损失函数共同反向优化生成器修复数据的细节和质量.最后,在公开数据集和真实数据集上的实验结果表明,该方法的修复准确度与模型稳定性显著优于现有方法.展开更多
基金supported by the National Natural Science Foundation of China(71973001).
文摘To explore the green development of automobile enterprises and promote the achievement of the“dual carbon”target,based on the bounded rationality assumptions,this study constructed a tripartite evolutionary game model of gov-ernment,commercial banks,and automobile enterprises;introduced a dynamic reward and punishment mechanism;and analyzed the development process of the three parties’strategic behavior under the static and dynamic reward and punish-ment mechanism.Vensim PLE was used for numerical simulation analysis.Our results indicate that the system could not reach a stable state under the static reward and punishment mechanism.A dynamic reward and punishment mechanism can effectively improve the system stability and better fit real situations.Under the dynamic reward and punishment mechan-ism,an increase in the initial probabilities of the three parties can promote the system stability,and the government can im-plement effective supervision by adjusting the upper limit of the reward and punishment intensity.Finally,the implementa-tion of green credit by commercial banks plays a significant role in promoting the green development of automobile enter-prises.
基金the National Natural Science Foun-dation of China(Grant No.71961003).
文摘In public goods games, punishments and rewards have been shown to be effective mechanisms for maintaining individualcooperation. However, punishments and rewards are costly to incentivize cooperation. Therefore, the generation ofcostly penalties and rewards has been a complex problem in promoting the development of cooperation. In real society,specialized institutions exist to punish evil people or reward good people by collecting taxes. We propose a strong altruisticpunishment or reward strategy in the public goods game through this phenomenon. Through theoretical analysis and numericalcalculation, we can get that tax-based strong altruistic punishment (reward) has more evolutionary advantages thantraditional strong altruistic punishment (reward) in maintaining cooperation and tax-based strong altruistic reward leads toa higher level of cooperation than tax-based strong altruistic punishment.
基金funded by National Natural Science Foundation of China(No.62063006)Guangxi Science and Technology Major Program(No.2022AA05002)+1 种基金Key Laboratory of AI and Information Processing(Hechi University),Education Department of Guangxi Zhuang Autonomous Region(No.2022GXZDSY003)Central Leading Local Science and Technology Development Fund Project of Wuzhou(No.202201001).
文摘By integrating deep neural networks with reinforcement learning,the Double Deep Q Network(DDQN)algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning of mobile robots.However,the traditional DDQN algorithm suffers from sparse rewards and inefficient utilization of high-quality data.Targeting those problems,an improved DDQN algorithm based on average Q-value estimation and reward redistribution was proposed.First,to enhance the precision of the target Q-value,the average of multiple previously learned Q-values from the target Q network is used to replace the single Q-value from the current target Q network.Next,a reward redistribution mechanism is designed to overcome the sparse reward problem by adjusting the final reward of each action using the round reward from trajectory information.Additionally,a reward-prioritized experience selection method is introduced,which ranks experience samples according to reward values to ensure frequent utilization of high-quality data.Finally,simulation experiments are conducted to verify the effectiveness of the proposed algorithm in fixed-position scenario and random environments.The experimental results show that compared to the traditional DDQN algorithm,the proposed algorithm achieves shorter average running time,higher average return and fewer average steps.The performance of the proposed algorithm is improved by 11.43%in the fixed scenario and 8.33%in random environments.It not only plans economic and safe paths but also significantly improves efficiency and generalization in path planning,making it suitable for widespread application in autonomous navigation and industrial automation.
文摘This essay aims at illustrate the important role, of reward and punishment in education from a psychological viewpoint. According to Stimulus and Response theory, reward and punishment are now commonly used by teachers to encourage both congnitive activities and appropriate behaviour in classroom . Either of them can be used to encourage or supervise the students in learning, and rewarding is fawoured. However reward mechanism must be used properly and under control. It should 't be overused. Also, there is a place for punishment in education because errors need to be pointed out and antisocial behaviour should be corrected. It can be applied only when the intensity, duration and timing are carefully considered. In a word , reward system . is undoubtedly to have positive effect while punishment is proved to cause unpredictable result. Those specific informations are mentioned in the essay that follows.
基金the National Natural Science Foundation of China(Grant No.62062049)the Social Science Project of the Ministry of Education of China(Grant No.20YJCZH212)the Natural Science Foundation of Gansu Province,China(Grant No.20JR5RA390).
文摘To study the incentive mechanisms of cooperation, we propose a preference rewarding mechanism in the spatial prisoner’s dilemma game, which simultaneously considers reputational preference, other-regarding preference and the dynamic adjustment of vertex weight. The vertex weight of a player is adaptively adjusted according to the comparison result of his own reputation and the average reputation value of his immediate neighbors. Players are inclined to pay a personal cost to reward the cooperative neighbor with the greatest vertex weight. The vertex weight of a player is proportional to the preference rewards he can obtain from direct neighbors. We find that the preference rewarding mechanism significantly facilitates the evolution of cooperation, and the dynamic adjustment of vertex weight has powerful effect on the emergence of cooperative behavior. To validate multiple effects, strategy distribution and the average payoff and fitness of players are discussed in a microcosmic view.
文摘为有效提高碳排放配额分配的合理性,并且避免年度结算时碳排放量超标导致环境污染加剧问题,提出基于奖惩因子的季节性碳交易机制,以园区综合能源系统(park integrated energy system,PIES)为对象进行低碳经济调度。首先,构建包含能量层–碳流层–管理层的综合能源系统(integrated energy system,IES)运行框架,建立电气热多能流供需动态一致性模型;其次,分析系统内“日–季节–年度”碳排放特性,打破传统应用指标法的配额分配方法,采用灰色关联分析法建立碳排放配额分配模型,并基于奖惩阶梯碳价制定季节性碳交易机制;最后,以系统内全寿命周期运行成本及碳交易成本最小为目标,对执行季节性碳交易机制的PIES进行低碳经济调度,分析长时间尺度下季节性储能参与调度的减碳量。搭建IEEE 33节点电网5节点气网7节点热网的PIES,并基于多场景进行算例分析,验证此调度方法能够实现零碳经济运行,保证系统供能可靠性,为建立零碳园区奠定理论基础。
文摘工业数据由于技术故障和人为因素通常导致数据异常,现有基于约束的方法因约束阈值设置的过于宽松或严格会导致修复错误,基于统计的方法因平滑修复机制导致对时间步长较远的异常值修复准确度较低.针对上述问题,提出了基于奖励机制的最小迭代修复和改进WGAN混合模型的时序数据修复方法.首先,在预处理阶段,保留异常数据,进行信息标注等处理,从而充分挖掘异常值与真实值之间的特征约束.其次,在噪声模块提出了近邻参数裁剪规则,用于修正最小迭代修复公式生成的噪声向量.将其传递至模拟分布模块的生成器中,同时设计了一个动态时间注意力网络层,用于提取时序特征权重并与门控循环单元串联组合捕捉不同步长的特征依赖,并引入递归多步预测原理共同提升模型的表达能力;在判别器中设计了Abnormal and Truth奖励机制和Weighted Mean Square Error损失函数共同反向优化生成器修复数据的细节和质量.最后,在公开数据集和真实数据集上的实验结果表明,该方法的修复准确度与模型稳定性显著优于现有方法.