期刊文献+
共找到3,457篇文章
< 1 2 173 >
每页显示 20 50 100
Team-based fixed-time containment control for multi-agent systems with disturbances
1
作者 赵小文 王进月 +1 位作者 赖强 刘源 《Chinese Physics B》 SCIE EI CAS CSCD 2023年第12期281-292,共12页
We investigate the fixed-time containment control(FCC)problem of multi-agent systems(MASs)under discontinuous communication.A saturation function is used in the controller to achieve the containment control in MASs.On... We investigate the fixed-time containment control(FCC)problem of multi-agent systems(MASs)under discontinuous communication.A saturation function is used in the controller to achieve the containment control in MASs.One difference from using a symbolic function is that it avoids the differential calculation process for discontinuous functions,which further ensures the continuity of the control input.Considering the discontinuous communication,a dynamic variable is constructed,which is always non-negative between any two communications of the agent.Based on the designed variable,the dynamic event-triggered algorithm is proposed to achieve FCC,which can effectively reduce controller updating.In addition,we further design a new event-triggered algorithm to achieve FCC,called the team-trigger mechanism,which combines the self-triggering technique with the proposed dynamic event trigger mechanism.It has faster convergence than the proposed dynamic event triggering technique and achieves the tradeoff between communication cost,convergence time and number of triggers in MASs.Finally,Zeno behavior is excluded and the validity of the proposed theory is confirmed by simulation. 展开更多
关键词 fixed-time containment control dynamic event-triggered strategy team-based triggered strategy multi-agent systems
下载PDF
On Principle of Rewards in English Learning
2
作者 熊莉芸 《广西中医学院学报》 2004年第2期110-114,共5页
There is no question that learning a foreign language like English is different from learning other subjects, mainly because it is new to us Chinese and there is no enough environment. But that doesn’t mean we have n... There is no question that learning a foreign language like English is different from learning other subjects, mainly because it is new to us Chinese and there is no enough environment. But that doesn’t mean we have no way to learn it and do it well .If asked to identify the most powerful influences on learning, motivation would probably be high on most teachers’ and learners’ lists. It seems only sensible to assume that English learning is most likely to occur when the learners want to learn. That is, when motivation such as interest, curiosity, or a desire achieves, the learners would be engaged in learning. However, how do we teachers motivate our students to like learning and learn well? Here, rewards both extrinsic and intrinsic are of great value and play a vital role in English learning. 展开更多
关键词 extrinsic and intrinsic rewards MOTIVATION ACTIVATE stimulate
下载PDF
Choice of discount rate in reinforcement learning with long-delay rewards 被引量:1
3
作者 LIN Xiangyang XING Qinghua LIU Fuxian 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2022年第2期381-392,共12页
In the world, most of the successes are results of longterm efforts. The reward of success is extremely high, but before that, a long-term investment process is required. People who are “myopic” only value short-ter... In the world, most of the successes are results of longterm efforts. The reward of success is extremely high, but before that, a long-term investment process is required. People who are “myopic” only value short-term rewards and are unwilling to make early-stage investments, so they hardly get the ultimate success and the corresponding high rewards. Similarly, for a reinforcement learning(RL) model with long-delay rewards, the discount rate determines the strength of agent’s “farsightedness”.In order to enable the trained agent to make a chain of correct choices and succeed finally, the feasible region of the discount rate is obtained through mathematical derivation in this paper firstly. It satisfies the “farsightedness” requirement of agent. Afterwards, in order to avoid the complicated problem of solving implicit equations in the process of choosing feasible solutions,a simple method is explored and verified by theoreti cal demonstration and mathematical experiments. Then, a series of RL experiments are designed and implemented to verify the validity of theory. Finally, the model is extended from the finite process to the infinite process. The validity of the extended model is verified by theories and experiments. The whole research not only reveals the significance of the discount rate, but also provides a theoretical basis as well as a practical method for the choice of discount rate in future researches. 展开更多
关键词 reinforcement learning(RL) discount rate longdelay reward Q-LEARNING treasure-detecting model feasible solution
下载PDF
Policy of giving rewards and subsidies for grassland ecological conservation in Tibetan Plateau
4
作者 YANG Ming-hong 《Ecological Economy》 2014年第1期2-9,共8页
This paper aims to explore the impact of policy of giving rewards and subsidies(GRS) for grassland ecological conservation in Tibetan Plateau implemented by the Chinese government since 2009.Taking Gerze County in Nga... This paper aims to explore the impact of policy of giving rewards and subsidies(GRS) for grassland ecological conservation in Tibetan Plateau implemented by the Chinese government since 2009.Taking Gerze County in Ngari Prefecture in the Tibetan Autonomous Region(TAR) as an example,it discusses the objective,implementation and outcome of that policy with regard to the ecological reconstruction and problems that have ensured.Located in the northern part of the Qiangtang Plateau,Gerze is the largest county in Ngari Prefecture.It covers more than 7.8 million acres of pastureland,of which 6.2 million acres are usable for pastoralism; 3.4 million acres,however,lack water source.In recent decades,due to the increased population and other reasons,pastures of the area have shown signs of overgrazing,thus leading to serious degradation,desertification and salinization of the grassland.Since 2009,when neighboring Coqin County was chosen as a pilot site for the national ecological incentive and subsidy policy(or: ecological compensation policy),Gerze has also started to adopt this policy and brought ful implementation in 2010.Its purpose is to solve the problem of overgrazing.But like other policies carried out in Gerze,its implementation is faced with many challenges.First,it is difficult to define the types and scopes of the incentives and subsidies,which have become a major source of complaints of the local herdsmen.Second,the local herdsmen are also concerned with the fairness of assigning rewards and subsidies.Third,the high cost of the policy's implementation and supervision reduces its effects.Fourth,the fact that the herdsmen are not willing to reduce livestock population makes it difficult for the policy to achieve actual results.The author thinks it's necessary to revise and improve the current ecological incentive and subsidy policy. 展开更多
关键词 POLICY of GIVING rewards and SUBSIDIES for grassla
下载PDF
Generational Gap: Intrinsic (Non-monetary) Versus Extrinsic (Monetary) Rewards in the Workforce
5
作者 Charles Chekwa Mmutakaego Chukwuanu Daisey Richardson 《Chinese Business Review》 2013年第6期414-424,共11页
关键词 劳动力 货币 奖励 高生产率 生理需求 需求层次 安全需求 自我实现
下载PDF
Nature Rewards Industry——Interviewing Nan Cunhui,Chairman of the CHINT Group
6
《China's Foreign Trade》 2001年第7期14-15,共2页
关键词 Interviewing Nan Cunhui Chairman of the CHINT Group Nature rewards Industry
下载PDF
Efficient Optimal Routing Algorithm Based on Reward and Penalty for Mobile Adhoc Networks
7
作者 Anubha Ravneet Preet Singh Bedi +3 位作者 Arfat Ahmad Khan Mohd Anul Haq Ahmad Alhussen Zamil S.Alzamil 《Computers, Materials & Continua》 SCIE EI 2023年第4期1331-1351,共21页
Mobile adhoc networks have grown in prominence in recent years,and they are now utilized in a broader range of applications.The main challenges are related to routing techniques that are generally employed in them.Mob... Mobile adhoc networks have grown in prominence in recent years,and they are now utilized in a broader range of applications.The main challenges are related to routing techniques that are generally employed in them.Mobile Adhoc system management,on the other hand,requires further testing and improvements in terms of security.Traditional routing protocols,such as Adhoc On-Demand Distance Vector(AODV)and Dynamic Source Routing(DSR),employ the hop count to calculate the distance between two nodes.The main aim of this research work is to determine the optimum method for sending packets while also extending life time of the network.It is achieved by changing the residual energy of each network node.Also,in this paper,various algorithms for optimal routing based on parameters like energy,distance,mobility,and the pheromone value are proposed.Moreover,an approach based on a reward and penalty system is given in this paper to evaluate the efficiency of the proposed algorithms under the impact of parameters.The simulation results unveil that the reward penalty-based approach is quite effective for the selection of an optimal path for routing when the algorithms are implemented under the parameters of interest,which helps in achieving less packet drop and energy consumption of the nodes along with enhancing the network efficiency. 展开更多
关键词 ROUTING optimization reward PENALTY MOBILITY energy THROUGHOUT PHEROMONE
下载PDF
Magnetic Field-Based Reward Shaping for Goal-Conditioned Reinforcement Learning
8
作者 Hongyu Ding Yuanze Tang +3 位作者 Qing Wu Bo Wang Chunlin Chen Zhi Wang 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2023年第12期2233-2247,共15页
Goal-conditioned reinforcement learning(RL)is an interesting extension of the traditional RL framework,where the dynamic environment and reward sparsity can cause conventional learning algorithms to fail.Reward shapin... Goal-conditioned reinforcement learning(RL)is an interesting extension of the traditional RL framework,where the dynamic environment and reward sparsity can cause conventional learning algorithms to fail.Reward shaping is a practical approach to improving sample efficiency by embedding human domain knowledge into the learning process.Existing reward shaping methods for goal-conditioned RL are typically built on distance metrics with a linear and isotropic distribution,which may fail to provide sufficient information about the ever-changing environment with high complexity.This paper proposes a novel magnetic field-based reward shaping(MFRS)method for goal-conditioned RL tasks with dynamic target and obstacles.Inspired by the physical properties of magnets,we consider the target and obstacles as permanent magnets and establish the reward function according to the intensity values of the magnetic field generated by these magnets.The nonlinear and anisotropic distribution of the magnetic field intensity can provide more accessible and conducive information about the optimization landscape,thus introducing a more sophisticated magnetic reward compared to the distance-based setting.Further,we transform our magnetic reward to the form of potential-based reward shaping by learning a secondary potential function concurrently to ensure the optimal policy invariance of our method.Experiments results in both simulated and real-world robotic manipulation tasks demonstrate that MFRS outperforms relevant existing methods and effectively improves the sample efficiency of RL algorithms in goal-conditioned tasks with various dynamics of the target and obstacles. 展开更多
关键词 Dynamic environments goal-conditioned reinforcement learning magnetic field reward shaping
下载PDF
Prisoners awarded——Henan Yudong Prison rewards accomplished inmates
9
作者 CHEN CHUN’AN 《The Journal of Human Rights》 2010年第4期34-35,共2页
Gao Pingyuan has seen new hopes of a new life after serving his terms for 12 years at the Yudong Prison in central China's Henan Province. He got the special class award for his accomplished teaching in prison.
关键词 Henan Yudong Prison rewards accomplished inmates Prisoners awarded
下载PDF
Valiant Gaokao Rewards
10
《ChinAfrica》 2014年第7期12-13,共2页
New rules for this year's national college entrance examination, or gaokao in Mandarin, which takes place from June 7 to 9 every year, sparked heated debate among the public in China. Before gaokao in 2014, some prov... New rules for this year's national college entrance examination, or gaokao in Mandarin, which takes place from June 7 to 9 every year, sparked heated debate among the public in China. Before gaokao in 2014, some provincial education authorities released a new policy stipulating that gaokao applicants may receive 10 to 20 extra points if they have "excellent morality" or have records of helping others for a just cause. 展开更多
关键词 Valiant Gaokao rewards
下载PDF
在一个非选择性冠状动脉疾病患者人群中比较西罗莫司洗脱支架与紫杉醇洗脱支架的远期临床结局和血栓形成率:REWARDS注册试验
11
作者 Waksman R. Buch A.N. +1 位作者 Torguson R. 奚群英 《世界核心医学期刊文摘(心脏病学分册)》 2007年第11期28-29,共2页
与裸金属支架相比,西罗莫司洗脱支架(SES)和紫杉醇洗脱支架(PES)可显著减少再次介入的需要。在经过选择的患者及病变组中,使用这些支架系统的随机、对照、直接对比试验所得出的比较性结局资料不一致;因此。
关键词 紫杉醇洗脱支架 西罗莫司洗脱支架 rewards 临床结局 非选择性 患者
下载PDF
考虑奖励机制的电动汽车充电优化引导策略 被引量:1
12
作者 张建宏 赵兴勇 王秀丽 《电网与清洁能源》 CSCD 北大核心 2024年第1期102-108,118,共8页
随着电动汽车(electric vehicle,EV)的大规模推广,其无序充电严重威胁电网的安全稳定运行,积极引导EV用户参与充电优化策略,对于提高电网的安全稳定性具有重要意义。为此,基于充电优化管理调度思路,提出一种考虑奖励机制的EV充电优化引... 随着电动汽车(electric vehicle,EV)的大规模推广,其无序充电严重威胁电网的安全稳定运行,积极引导EV用户参与充电优化策略,对于提高电网的安全稳定性具有重要意义。为此,基于充电优化管理调度思路,提出一种考虑奖励机制的EV充电优化引导策略,在分时电价的基础上,计入用户在降低电网负荷波动中的奖励机制,考虑充电位置固定、不确定用户的出行需求,确定EV的充电时间及充电位置,达到用户满意度最高的目的;利用EV动态响应的实时优化算法,对所提的优化调度模型进行求解。仿真结果验证了所提策略的有效性和可行性,该优化调度策略不仅能有效改善负荷低谷时段集中充电形成新的负荷高峰的问题,而且可明显降低用户的充电成本及电网负荷波动。 展开更多
关键词 电动汽车 充电控制 负荷波动 奖励机制 优化引导策略
下载PDF
Effectiveness of Reward System on Assessment Outcomes in Mathematics
13
作者 May Semira Inandan 《Journal of Contemporary Educational Research》 2023年第9期52-58,共7页
As assessment outcomes provide students with a sense of accomplishment that is boosted by the reward system,learning becomes more effective.This research aims to determine the effects of reward system prior to assessm... As assessment outcomes provide students with a sense of accomplishment that is boosted by the reward system,learning becomes more effective.This research aims to determine the effects of reward system prior to assessment in Mathematics.Quasi-experimental research design was used to examine whether there was a significant difference between the use of reward system and students’level of performance in Mathematics.Through purposive sampling,the respondents of the study involve 80 Grade 9 students belonging to two sections from Gaudencio B.Lontok Memorial Integrated School.Based on similar demographics and pre-test results,control and study group were involved as participants of the study.Data were treated and analyzed accordingly using statistical treatments such as mean and t-test for independent variables.There was a significant finding revealing the advantage of using the reward system compare to the non-reward system in increasing students’level of performance in Mathematics.It is concluded that the use of reward system is effective in improving the assessment outcomes in Mathematics.It is recommended to use reward system for persistent assessment outcomes prior to assessment,to be a reflection of the intended outcomes in Mathematics. 展开更多
关键词 MATHEMATICS reward system Assessment outcomes
下载PDF
基于季节性碳交易机制的园区综合能源系统低碳经济调度 被引量:4
14
作者 颜宁 马广超 +2 位作者 李相俊 李洋 马少华 《中国电机工程学报》 EI CSCD 北大核心 2024年第3期918-931,I0006,共15页
为有效提高碳排放配额分配的合理性,并且避免年度结算时碳排放量超标导致环境污染加剧问题,提出基于奖惩因子的季节性碳交易机制,以园区综合能源系统(park integrated energy system,PIES)为对象进行低碳经济调度。首先,构建包含能量层... 为有效提高碳排放配额分配的合理性,并且避免年度结算时碳排放量超标导致环境污染加剧问题,提出基于奖惩因子的季节性碳交易机制,以园区综合能源系统(park integrated energy system,PIES)为对象进行低碳经济调度。首先,构建包含能量层–碳流层–管理层的综合能源系统(integrated energy system,IES)运行框架,建立电气热多能流供需动态一致性模型;其次,分析系统内“日–季节–年度”碳排放特性,打破传统应用指标法的配额分配方法,采用灰色关联分析法建立碳排放配额分配模型,并基于奖惩阶梯碳价制定季节性碳交易机制;最后,以系统内全寿命周期运行成本及碳交易成本最小为目标,对执行季节性碳交易机制的PIES进行低碳经济调度,分析长时间尺度下季节性储能参与调度的减碳量。搭建IEEE 33节点电网5节点气网7节点热网的PIES,并基于多场景进行算例分析,验证此调度方法能够实现零碳经济运行,保证系统供能可靠性,为建立零碳园区奠定理论基础。 展开更多
关键词 园区综合能源系统 季节性碳交易机制 奖惩阶梯碳价 灰色关联分析法
下载PDF
生态补偿视角下流域跨界水污染协同治理机制设计及演化博弈分析 被引量:1
15
作者 杨霞 何刚 +1 位作者 吴传良 张世玉 《安全与环境学报》 CAS CSCD 北大核心 2024年第5期2033-2042,共10页
针对流域相邻两地区和流域管理机构三方博弈主体,引入双向生态补偿-奖惩机制,构建流域跨界水污染三方演化博弈理论模型。通过稳定性分析得出流域跨界水污染协同治理理想状态的稳定条件,并结合新安江流域生态补偿试点案例进行仿真分析。... 针对流域相邻两地区和流域管理机构三方博弈主体,引入双向生态补偿-奖惩机制,构建流域跨界水污染三方演化博弈理论模型。通过稳定性分析得出流域跨界水污染协同治理理想状态的稳定条件,并结合新安江流域生态补偿试点案例进行仿真分析。结果表明:(1)引入双向生态补偿-奖惩机制可有效推动新安江流域相邻两地采取达标排放行为,促使系统达到(1, 1, 0)稳定状态;(2)动态奖惩机制组合使用有助于系统演化,从博弈主体初始意愿、实施效能和支持倾向等角度综合考虑,动态奖励-静态惩罚策略监管效果最优,动态奖励-动态惩罚策略次之;(3)流域跨界水污染协同治理的实现策略与相邻两地达标排放的治理成本与收益、双向生态补偿金额度、流域管理机构发放生态补偿奖励金额、积极监管成本和收益等因素密切相关。 展开更多
关键词 环境学 水污染 演化博弈 生态补偿 动态奖惩 协同治理
下载PDF
稀疏奖励场景下基于状态空间探索的多智能体强化学习算法
16
作者 方宝富 余婷婷 +1 位作者 王浩 王在俊 《模式识别与人工智能》 EI CSCD 北大核心 2024年第5期435-446,共12页
多智能体的任务场景往往伴随着庞大、多样的状态空间,而且在某些情况下,外部环境提供的奖励信息可能非常有限,呈现出稀疏奖励的特征.现有的大部分多智能体强化学习算法在此类稀疏奖励场景下效果有限,因为算法仅依赖于偶然发现的奖励序列... 多智能体的任务场景往往伴随着庞大、多样的状态空间,而且在某些情况下,外部环境提供的奖励信息可能非常有限,呈现出稀疏奖励的特征.现有的大部分多智能体强化学习算法在此类稀疏奖励场景下效果有限,因为算法仅依赖于偶然发现的奖励序列,会导致学习过程缓慢和低效.为了解决这一问题,文中提出基于状态空间探索的多智能体强化学习算法,构建状态子集空间,从中映射出一个状态,并将其作为内在目标,使智能体更充分利用状态空间并减少不必要的探索.将智能体状态分解成自身状态与环境状态,结合这两类状态与内在目标,生成基于互信息的内在奖励.构建状态子集空间和基于互信息的内在奖励,对接近目标状态的状态与理解环境的状态给予适当的奖励,以激励智能体更积极地朝着目标前进,同时增强对环境的理解,从而引导其灵活适应稀疏奖励场景.在稀疏程度不同的多智能体协作场景中的实验验证文中算法性能较优. 展开更多
关键词 强化学习 稀疏奖励 互信息 内在奖励
下载PDF
Rewards and Recognition Spark Revision Writing
17
作者 Robin Craft Jones 《Journalism and Mass Communication》 2017年第2期97-101,共5页
下载PDF
前景理论视角下废旧动力电池回收监管演化博弈分析 被引量:1
18
作者 许礼刚 刘荣福 +1 位作者 陈磊 倪俊 《重庆理工大学学报(自然科学)》 CAS 北大核心 2024年第1期290-297,共8页
废旧动力电池具有较强的负外部性,违背了新能源汽车设计的初衷。为促进废旧动力电池有效回收,将前景理论与演化博弈论耦合,综合考虑政府、企业(汽车生产)和公众之间的利益,促使政府和公众对企业进行共同监督,构建三方博弈模型。针对初... 废旧动力电池具有较强的负外部性,违背了新能源汽车设计的初衷。为促进废旧动力电池有效回收,将前景理论与演化博弈论耦合,综合考虑政府、企业(汽车生产)和公众之间的利益,促使政府和公众对企业进行共同监督,构建三方博弈模型。针对初始意愿、罚款组成、风险态度系数和损失规避系数的不同情况,进行模拟数值仿真,并结合现实中废旧动力电池的认识度、奖惩机制和盈利信心进行分析。研究表明:提高公众或政府的初始监督意愿,可以促进企业回收废旧动力电池;当企业的回收策略为亏损时,提高企业对公众的补偿金额、降低企业的风险态度系数和损失规避系数,可以促进企业积极回收;在废旧动力电池回收过程中,共同监督的效果优于单独监督。 展开更多
关键词 动力电池 演化博弈 前景理论 共同监督 奖惩机制
下载PDF
政府监管下直播带货平台合谋行为的奖惩机制研究 被引量:1
19
作者 李国昊 梅婷 梁永滔 《江苏大学学报(社会科学版)》 2024年第2期100-112,共13页
“直播+电商”的新型商品销售模式正飞速发展,但该过程中存在诸多问题。本文考虑了直播带货平台与平台商家合谋以获取超额利润的现象,建立并分析了不同的奖惩机制下直播带货平台与政府监管机构的演化博弈模型,最终得出以下结论:静态奖... “直播+电商”的新型商品销售模式正飞速发展,但该过程中存在诸多问题。本文考虑了直播带货平台与平台商家合谋以获取超额利润的现象,建立并分析了不同的奖惩机制下直播带货平台与政府监管机构的演化博弈模型,最终得出以下结论:静态奖惩机制与动态奖励静态惩罚机制下,系统不存在稳定均衡点;静态奖励动态惩罚和动态奖励动态惩罚机制下,系统存在稳定均衡点,但动态奖惩机制下直播带货平台与平台商家合谋的概率更低。动态奖惩机制下,直播带货平台和直播电商合谋行为与奖惩力度有关,当惩罚力度增加时,直播带货平台与平台商家合谋的概率下降,政府监管成本降低;当奖励强度增加时,政府严格监管概率降低,直播带货平台与平台商家合谋概率降低但变化较小。因此,政府监管机构采用科学合理的动态奖惩机制有助于直播带货行业的稳健发展。 展开更多
关键词 直播带货平台 奖惩机制 演化博弈 合谋行为
下载PDF
改进MADDPG算法的非凸环境下多智能体自组织协同围捕
20
作者 张红强 石佳航 +5 位作者 吴亮红 王汐 左词立 陈祖国 刘朝华 陈磊 《计算机科学与探索》 CSCD 北大核心 2024年第8期2080-2090,共11页
针对多智能体在非凸环境下的围捕效率问题,提出基于改进经验回放的多智能体强化学习算法。利用残差网络(ResNet)来改善网络退化问题,并与多智能体深度确定性策略梯度算法(MADDPG)相结合,提出了RW-MADDPG算法。为解决多智能体在训练过程... 针对多智能体在非凸环境下的围捕效率问题,提出基于改进经验回放的多智能体强化学习算法。利用残差网络(ResNet)来改善网络退化问题,并与多智能体深度确定性策略梯度算法(MADDPG)相结合,提出了RW-MADDPG算法。为解决多智能体在训练过程中,经验池数据利用率低的问题,提出两种改善经验池数据利用率的方法;为解决多智能体在非凸障碍环境下陷入障碍物内部的情况(如陷入目标不可达等),通过设计合理的围捕奖励函数使得智能体在非凸障碍物环境下完成围捕任务。基于此算法设计仿真实验,实验结果表明,该算法在训练阶段奖励增加得更快,能更快地完成围捕任务,相比MADDPG算法静态围捕环境下训练时间缩短18.5%,动态环境下训练时间缩短49.5%,而且在非凸障碍环境下该算法训练的围捕智能体的全局平均奖励更高。 展开更多
关键词 深度强化学习 RW-MADDPG 残差网络 经验池 围捕奖励函数
下载PDF
上一页 1 2 173 下一页 到第
使用帮助 返回顶部