There is no question that learning a foreign language like English is different from learning other subjects, mainly because it is new to us Chinese and there is no enough environment. But that doesn’t mean we have n...There is no question that learning a foreign language like English is different from learning other subjects, mainly because it is new to us Chinese and there is no enough environment. But that doesn’t mean we have no way to learn it and do it well .If asked to identify the most powerful influences on learning, motivation would probably be high on most teachers’ and learners’ lists. It seems only sensible to assume that English learning is most likely to occur when the learners want to learn. That is, when motivation such as interest, curiosity, or a desire achieves, the learners would be engaged in learning. However, how do we teachers motivate our students to like learning and learn well? Here, rewards both extrinsic and intrinsic are of great value and play a vital role in English learning.展开更多
In the world, most of the successes are results of longterm efforts. The reward of success is extremely high, but before that, a long-term investment process is required. People who are “myopic” only value short-ter...In the world, most of the successes are results of longterm efforts. The reward of success is extremely high, but before that, a long-term investment process is required. People who are “myopic” only value short-term rewards and are unwilling to make early-stage investments, so they hardly get the ultimate success and the corresponding high rewards. Similarly, for a reinforcement learning(RL) model with long-delay rewards, the discount rate determines the strength of agent’s “farsightedness”.In order to enable the trained agent to make a chain of correct choices and succeed finally, the feasible region of the discount rate is obtained through mathematical derivation in this paper firstly. It satisfies the “farsightedness” requirement of agent. Afterwards, in order to avoid the complicated problem of solving implicit equations in the process of choosing feasible solutions,a simple method is explored and verified by theoreti cal demonstration and mathematical experiments. Then, a series of RL experiments are designed and implemented to verify the validity of theory. Finally, the model is extended from the finite process to the infinite process. The validity of the extended model is verified by theories and experiments. The whole research not only reveals the significance of the discount rate, but also provides a theoretical basis as well as a practical method for the choice of discount rate in future researches.展开更多
This paper aims to explore the impact of policy of giving rewards and subsidies(GRS) for grassland ecological conservation in Tibetan Plateau implemented by the Chinese government since 2009.Taking Gerze County in Nga...This paper aims to explore the impact of policy of giving rewards and subsidies(GRS) for grassland ecological conservation in Tibetan Plateau implemented by the Chinese government since 2009.Taking Gerze County in Ngari Prefecture in the Tibetan Autonomous Region(TAR) as an example,it discusses the objective,implementation and outcome of that policy with regard to the ecological reconstruction and problems that have ensured.Located in the northern part of the Qiangtang Plateau,Gerze is the largest county in Ngari Prefecture.It covers more than 7.8 million acres of pastureland,of which 6.2 million acres are usable for pastoralism; 3.4 million acres,however,lack water source.In recent decades,due to the increased population and other reasons,pastures of the area have shown signs of overgrazing,thus leading to serious degradation,desertification and salinization of the grassland.Since 2009,when neighboring Coqin County was chosen as a pilot site for the national ecological incentive and subsidy policy(or: ecological compensation policy),Gerze has also started to adopt this policy and brought ful implementation in 2010.Its purpose is to solve the problem of overgrazing.But like other policies carried out in Gerze,its implementation is faced with many challenges.First,it is difficult to define the types and scopes of the incentives and subsidies,which have become a major source of complaints of the local herdsmen.Second,the local herdsmen are also concerned with the fairness of assigning rewards and subsidies.Third,the high cost of the policy's implementation and supervision reduces its effects.Fourth,the fact that the herdsmen are not willing to reduce livestock population makes it difficult for the policy to achieve actual results.The author thinks it's necessary to revise and improve the current ecological incentive and subsidy policy.展开更多
Mobile adhoc networks have grown in prominence in recent years,and they are now utilized in a broader range of applications.The main challenges are related to routing techniques that are generally employed in them.Mob...Mobile adhoc networks have grown in prominence in recent years,and they are now utilized in a broader range of applications.The main challenges are related to routing techniques that are generally employed in them.Mobile Adhoc system management,on the other hand,requires further testing and improvements in terms of security.Traditional routing protocols,such as Adhoc On-Demand Distance Vector(AODV)and Dynamic Source Routing(DSR),employ the hop count to calculate the distance between two nodes.The main aim of this research work is to determine the optimum method for sending packets while also extending life time of the network.It is achieved by changing the residual energy of each network node.Also,in this paper,various algorithms for optimal routing based on parameters like energy,distance,mobility,and the pheromone value are proposed.Moreover,an approach based on a reward and penalty system is given in this paper to evaluate the efficiency of the proposed algorithms under the impact of parameters.The simulation results unveil that the reward penalty-based approach is quite effective for the selection of an optimal path for routing when the algorithms are implemented under the parameters of interest,which helps in achieving less packet drop and energy consumption of the nodes along with enhancing the network efficiency.展开更多
Goal-conditioned reinforcement learning(RL)is an interesting extension of the traditional RL framework,where the dynamic environment and reward sparsity can cause conventional learning algorithms to fail.Reward shapin...Goal-conditioned reinforcement learning(RL)is an interesting extension of the traditional RL framework,where the dynamic environment and reward sparsity can cause conventional learning algorithms to fail.Reward shaping is a practical approach to improving sample efficiency by embedding human domain knowledge into the learning process.Existing reward shaping methods for goal-conditioned RL are typically built on distance metrics with a linear and isotropic distribution,which may fail to provide sufficient information about the ever-changing environment with high complexity.This paper proposes a novel magnetic field-based reward shaping(MFRS)method for goal-conditioned RL tasks with dynamic target and obstacles.Inspired by the physical properties of magnets,we consider the target and obstacles as permanent magnets and establish the reward function according to the intensity values of the magnetic field generated by these magnets.The nonlinear and anisotropic distribution of the magnetic field intensity can provide more accessible and conducive information about the optimization landscape,thus introducing a more sophisticated magnetic reward compared to the distance-based setting.Further,we transform our magnetic reward to the form of potential-based reward shaping by learning a secondary potential function concurrently to ensure the optimal policy invariance of our method.Experiments results in both simulated and real-world robotic manipulation tasks demonstrate that MFRS outperforms relevant existing methods and effectively improves the sample efficiency of RL algorithms in goal-conditioned tasks with various dynamics of the target and obstacles.展开更多
Gao Pingyuan has seen new hopes of a new life after serving his terms for 12 years at the Yudong Prison in central China's Henan Province. He got the special class award for his accomplished teaching in prison.
New rules for this year's national college entrance examination, or gaokao in Mandarin, which takes place from June 7 to 9 every year, sparked heated debate among the public in China. Before gaokao in 2014, some prov...New rules for this year's national college entrance examination, or gaokao in Mandarin, which takes place from June 7 to 9 every year, sparked heated debate among the public in China. Before gaokao in 2014, some provincial education authorities released a new policy stipulating that gaokao applicants may receive 10 to 20 extra points if they have "excellent morality" or have records of helping others for a just cause.展开更多
As assessment outcomes provide students with a sense of accomplishment that is boosted by the reward system,learning becomes more effective.This research aims to determine the effects of reward system prior to assessm...As assessment outcomes provide students with a sense of accomplishment that is boosted by the reward system,learning becomes more effective.This research aims to determine the effects of reward system prior to assessment in Mathematics.Quasi-experimental research design was used to examine whether there was a significant difference between the use of reward system and students’level of performance in Mathematics.Through purposive sampling,the respondents of the study involve 80 Grade 9 students belonging to two sections from Gaudencio B.Lontok Memorial Integrated School.Based on similar demographics and pre-test results,control and study group were involved as participants of the study.Data were treated and analyzed accordingly using statistical treatments such as mean and t-test for independent variables.There was a significant finding revealing the advantage of using the reward system compare to the non-reward system in increasing students’level of performance in Mathematics.It is concluded that the use of reward system is effective in improving the assessment outcomes in Mathematics.It is recommended to use reward system for persistent assessment outcomes prior to assessment,to be a reflection of the intended outcomes in Mathematics.展开更多
为有效提高碳排放配额分配的合理性,并且避免年度结算时碳排放量超标导致环境污染加剧问题,提出基于奖惩因子的季节性碳交易机制,以园区综合能源系统(park integrated energy system,PIES)为对象进行低碳经济调度。首先,构建包含能量层...为有效提高碳排放配额分配的合理性,并且避免年度结算时碳排放量超标导致环境污染加剧问题,提出基于奖惩因子的季节性碳交易机制,以园区综合能源系统(park integrated energy system,PIES)为对象进行低碳经济调度。首先,构建包含能量层–碳流层–管理层的综合能源系统(integrated energy system,IES)运行框架,建立电气热多能流供需动态一致性模型;其次,分析系统内“日–季节–年度”碳排放特性,打破传统应用指标法的配额分配方法,采用灰色关联分析法建立碳排放配额分配模型,并基于奖惩阶梯碳价制定季节性碳交易机制;最后,以系统内全寿命周期运行成本及碳交易成本最小为目标,对执行季节性碳交易机制的PIES进行低碳经济调度,分析长时间尺度下季节性储能参与调度的减碳量。搭建IEEE 33节点电网5节点气网7节点热网的PIES,并基于多场景进行算例分析,验证此调度方法能够实现零碳经济运行,保证系统供能可靠性,为建立零碳园区奠定理论基础。展开更多
文摘There is no question that learning a foreign language like English is different from learning other subjects, mainly because it is new to us Chinese and there is no enough environment. But that doesn’t mean we have no way to learn it and do it well .If asked to identify the most powerful influences on learning, motivation would probably be high on most teachers’ and learners’ lists. It seems only sensible to assume that English learning is most likely to occur when the learners want to learn. That is, when motivation such as interest, curiosity, or a desire achieves, the learners would be engaged in learning. However, how do we teachers motivate our students to like learning and learn well? Here, rewards both extrinsic and intrinsic are of great value and play a vital role in English learning.
基金supported by the National Natural Science Foundation of China (717712167170120972001214)。
文摘In the world, most of the successes are results of longterm efforts. The reward of success is extremely high, but before that, a long-term investment process is required. People who are “myopic” only value short-term rewards and are unwilling to make early-stage investments, so they hardly get the ultimate success and the corresponding high rewards. Similarly, for a reinforcement learning(RL) model with long-delay rewards, the discount rate determines the strength of agent’s “farsightedness”.In order to enable the trained agent to make a chain of correct choices and succeed finally, the feasible region of the discount rate is obtained through mathematical derivation in this paper firstly. It satisfies the “farsightedness” requirement of agent. Afterwards, in order to avoid the complicated problem of solving implicit equations in the process of choosing feasible solutions,a simple method is explored and verified by theoreti cal demonstration and mathematical experiments. Then, a series of RL experiments are designed and implemented to verify the validity of theory. Finally, the model is extended from the finite process to the infinite process. The validity of the extended model is verified by theories and experiments. The whole research not only reveals the significance of the discount rate, but also provides a theoretical basis as well as a practical method for the choice of discount rate in future researches.
基金sponsored by National Natural Science Fund of China (Grant No.71273183)Natioanl Project 985 of Sichuan University
文摘This paper aims to explore the impact of policy of giving rewards and subsidies(GRS) for grassland ecological conservation in Tibetan Plateau implemented by the Chinese government since 2009.Taking Gerze County in Ngari Prefecture in the Tibetan Autonomous Region(TAR) as an example,it discusses the objective,implementation and outcome of that policy with regard to the ecological reconstruction and problems that have ensured.Located in the northern part of the Qiangtang Plateau,Gerze is the largest county in Ngari Prefecture.It covers more than 7.8 million acres of pastureland,of which 6.2 million acres are usable for pastoralism; 3.4 million acres,however,lack water source.In recent decades,due to the increased population and other reasons,pastures of the area have shown signs of overgrazing,thus leading to serious degradation,desertification and salinization of the grassland.Since 2009,when neighboring Coqin County was chosen as a pilot site for the national ecological incentive and subsidy policy(or: ecological compensation policy),Gerze has also started to adopt this policy and brought ful implementation in 2010.Its purpose is to solve the problem of overgrazing.But like other policies carried out in Gerze,its implementation is faced with many challenges.First,it is difficult to define the types and scopes of the incentives and subsidies,which have become a major source of complaints of the local herdsmen.Second,the local herdsmen are also concerned with the fairness of assigning rewards and subsidies.Third,the high cost of the policy's implementation and supervision reduces its effects.Fourth,the fact that the herdsmen are not willing to reduce livestock population makes it difficult for the policy to achieve actual results.The author thinks it's necessary to revise and improve the current ecological incentive and subsidy policy.
文摘Mobile adhoc networks have grown in prominence in recent years,and they are now utilized in a broader range of applications.The main challenges are related to routing techniques that are generally employed in them.Mobile Adhoc system management,on the other hand,requires further testing and improvements in terms of security.Traditional routing protocols,such as Adhoc On-Demand Distance Vector(AODV)and Dynamic Source Routing(DSR),employ the hop count to calculate the distance between two nodes.The main aim of this research work is to determine the optimum method for sending packets while also extending life time of the network.It is achieved by changing the residual energy of each network node.Also,in this paper,various algorithms for optimal routing based on parameters like energy,distance,mobility,and the pheromone value are proposed.Moreover,an approach based on a reward and penalty system is given in this paper to evaluate the efficiency of the proposed algorithms under the impact of parameters.The simulation results unveil that the reward penalty-based approach is quite effective for the selection of an optimal path for routing when the algorithms are implemented under the parameters of interest,which helps in achieving less packet drop and energy consumption of the nodes along with enhancing the network efficiency.
基金supported in part by the National Natural Science Foundation of China(62006111,62073160)the Natural Science Foundation of Jiangsu Province of China(BK20200330)。
文摘Goal-conditioned reinforcement learning(RL)is an interesting extension of the traditional RL framework,where the dynamic environment and reward sparsity can cause conventional learning algorithms to fail.Reward shaping is a practical approach to improving sample efficiency by embedding human domain knowledge into the learning process.Existing reward shaping methods for goal-conditioned RL are typically built on distance metrics with a linear and isotropic distribution,which may fail to provide sufficient information about the ever-changing environment with high complexity.This paper proposes a novel magnetic field-based reward shaping(MFRS)method for goal-conditioned RL tasks with dynamic target and obstacles.Inspired by the physical properties of magnets,we consider the target and obstacles as permanent magnets and establish the reward function according to the intensity values of the magnetic field generated by these magnets.The nonlinear and anisotropic distribution of the magnetic field intensity can provide more accessible and conducive information about the optimization landscape,thus introducing a more sophisticated magnetic reward compared to the distance-based setting.Further,we transform our magnetic reward to the form of potential-based reward shaping by learning a secondary potential function concurrently to ensure the optimal policy invariance of our method.Experiments results in both simulated and real-world robotic manipulation tasks demonstrate that MFRS outperforms relevant existing methods and effectively improves the sample efficiency of RL algorithms in goal-conditioned tasks with various dynamics of the target and obstacles.
文摘Gao Pingyuan has seen new hopes of a new life after serving his terms for 12 years at the Yudong Prison in central China's Henan Province. He got the special class award for his accomplished teaching in prison.
文摘New rules for this year's national college entrance examination, or gaokao in Mandarin, which takes place from June 7 to 9 every year, sparked heated debate among the public in China. Before gaokao in 2014, some provincial education authorities released a new policy stipulating that gaokao applicants may receive 10 to 20 extra points if they have "excellent morality" or have records of helping others for a just cause.
文摘As assessment outcomes provide students with a sense of accomplishment that is boosted by the reward system,learning becomes more effective.This research aims to determine the effects of reward system prior to assessment in Mathematics.Quasi-experimental research design was used to examine whether there was a significant difference between the use of reward system and students’level of performance in Mathematics.Through purposive sampling,the respondents of the study involve 80 Grade 9 students belonging to two sections from Gaudencio B.Lontok Memorial Integrated School.Based on similar demographics and pre-test results,control and study group were involved as participants of the study.Data were treated and analyzed accordingly using statistical treatments such as mean and t-test for independent variables.There was a significant finding revealing the advantage of using the reward system compare to the non-reward system in increasing students’level of performance in Mathematics.It is concluded that the use of reward system is effective in improving the assessment outcomes in Mathematics.It is recommended to use reward system for persistent assessment outcomes prior to assessment,to be a reflection of the intended outcomes in Mathematics.
文摘为有效提高碳排放配额分配的合理性,并且避免年度结算时碳排放量超标导致环境污染加剧问题,提出基于奖惩因子的季节性碳交易机制,以园区综合能源系统(park integrated energy system,PIES)为对象进行低碳经济调度。首先,构建包含能量层–碳流层–管理层的综合能源系统(integrated energy system,IES)运行框架,建立电气热多能流供需动态一致性模型;其次,分析系统内“日–季节–年度”碳排放特性,打破传统应用指标法的配额分配方法,采用灰色关联分析法建立碳排放配额分配模型,并基于奖惩阶梯碳价制定季节性碳交易机制;最后,以系统内全寿命周期运行成本及碳交易成本最小为目标,对执行季节性碳交易机制的PIES进行低碳经济调度,分析长时间尺度下季节性储能参与调度的减碳量。搭建IEEE 33节点电网5节点气网7节点热网的PIES,并基于多场景进行算例分析,验证此调度方法能够实现零碳经济运行,保证系统供能可靠性,为建立零碳园区奠定理论基础。