To study the incentive mechanisms of cooperation, we propose a preference rewarding mechanism in the spatial prisoner’s dilemma game, which simultaneously considers reputational preference, other-regarding preference...To study the incentive mechanisms of cooperation, we propose a preference rewarding mechanism in the spatial prisoner’s dilemma game, which simultaneously considers reputational preference, other-regarding preference and the dynamic adjustment of vertex weight. The vertex weight of a player is adaptively adjusted according to the comparison result of his own reputation and the average reputation value of his immediate neighbors. Players are inclined to pay a personal cost to reward the cooperative neighbor with the greatest vertex weight. The vertex weight of a player is proportional to the preference rewards he can obtain from direct neighbors. We find that the preference rewarding mechanism significantly facilitates the evolution of cooperation, and the dynamic adjustment of vertex weight has powerful effect on the emergence of cooperative behavior. To validate multiple effects, strategy distribution and the average payoff and fitness of players are discussed in a microcosmic view.展开更多
This work aims to identify a method by the coordinator of the OU(operational unit)for the training of gratified personnel through the use of a rewarding system.The continuous transformations that concern the Italian h...This work aims to identify a method by the coordinator of the OU(operational unit)for the training of gratified personnel through the use of a rewarding system.The continuous transformations that concern the Italian healthcare scene lead the operators to face always new needs and problems.Professionals can not only be considered as workers but bearers of qualified intellectual,professional and cultural skills.Individual coordinators are required to be real leaders within their operational units and to use their managerial skills in achieving company objectives and in evaluating the personnel they manage.The main factor to which difficulties in the management of staff are related concerns the motivation,defined as a state of mind together with aspirations,needs,orientations,that pushes people to act and to use a behavior characterized by commitment,perseverance and determination.The need to better rationalize the resources available,to promote high quality health care,improving safety,efficiency and appropriateness has led the general management and coordinator of the OU to use the reward systems.With the introduction of this procedure aimed at enhancing the merit and encouraging virtuous behavior during the provision of health services,the public employment reform participates in the evolution of the regulatory framework and it turns on the change that is taking place in the world of work.展开更多
In public goods games, punishments and rewards have been shown to be effective mechanisms for maintaining individualcooperation. However, punishments and rewards are costly to incentivize cooperation. Therefore, the g...In public goods games, punishments and rewards have been shown to be effective mechanisms for maintaining individualcooperation. However, punishments and rewards are costly to incentivize cooperation. Therefore, the generation ofcostly penalties and rewards has been a complex problem in promoting the development of cooperation. In real society,specialized institutions exist to punish evil people or reward good people by collecting taxes. We propose a strong altruisticpunishment or reward strategy in the public goods game through this phenomenon. Through theoretical analysis and numericalcalculation, we can get that tax-based strong altruistic punishment (reward) has more evolutionary advantages thantraditional strong altruistic punishment (reward) in maintaining cooperation and tax-based strong altruistic reward leads toa higher level of cooperation than tax-based strong altruistic punishment.展开更多
Network marketing is a trading technique that provides companies with the opportunity to increase sales.With the increasing number of Internet-based purchases,several threats are increasingly observed in this field,su...Network marketing is a trading technique that provides companies with the opportunity to increase sales.With the increasing number of Internet-based purchases,several threats are increasingly observed in this field,such as user privacy violations,company owner(CO)fraud,the changing of sold products’information,and the scalability of selling networks.This study presents the concept of a blockchain-based market called ACR-MLM that functions based on the multi-level marketing(MLM)model,through which registered users receive anonymous and confidential rewards for their own and their subgroups’sales.Applying a public blockchain as the ACR-MLM framework’s infrastructure solves existing problems in MLM-based markets,such as CO fraud(against the government or its users),user privacy violations(obtaining their real names or subgroup users),and scalability(when vast numbers of users have been registered).To provide confidentiality and scalability to the ACR-MLM framework,hierarchical identity-based encryption(HIBE)was applied with a functional encryption(FE)scheme.Finally,the security of ACR-MLM is analyzed using the random oracle(RO)model and then evaluated.展开更多
The CAS Institute of Modern Physics is a center of pure basic research concerning nuclear physics, accelerator physics and related technology. In recent years, it succeeded in the construction of China’s first produc...The CAS Institute of Modern Physics is a center of pure basic research concerning nuclear physics, accelerator physics and related technology. In recent years, it succeeded in the construction of China’s first production line for manufacturing radiation-crosslinked (RC) wire and cable with the aid of international cooperation,achieving rewarding benefits from it.展开更多
By integrating deep neural networks with reinforcement learning,the Double Deep Q Network(DDQN)algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning...By integrating deep neural networks with reinforcement learning,the Double Deep Q Network(DDQN)algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning of mobile robots.However,the traditional DDQN algorithm suffers from sparse rewards and inefficient utilization of high-quality data.Targeting those problems,an improved DDQN algorithm based on average Q-value estimation and reward redistribution was proposed.First,to enhance the precision of the target Q-value,the average of multiple previously learned Q-values from the target Q network is used to replace the single Q-value from the current target Q network.Next,a reward redistribution mechanism is designed to overcome the sparse reward problem by adjusting the final reward of each action using the round reward from trajectory information.Additionally,a reward-prioritized experience selection method is introduced,which ranks experience samples according to reward values to ensure frequent utilization of high-quality data.Finally,simulation experiments are conducted to verify the effectiveness of the proposed algorithm in fixed-position scenario and random environments.The experimental results show that compared to the traditional DDQN algorithm,the proposed algorithm achieves shorter average running time,higher average return and fewer average steps.The performance of the proposed algorithm is improved by 11.43%in the fixed scenario and 8.33%in random environments.It not only plans economic and safe paths but also significantly improves efficiency and generalization in path planning,making it suitable for widespread application in autonomous navigation and industrial automation.展开更多
Due to the issue of long-horizon,a substantial number of visits to the state space is required during the exploration phase of reinforcement learning(RL)to gather valuable information.Addi-tionally,due to the challeng...Due to the issue of long-horizon,a substantial number of visits to the state space is required during the exploration phase of reinforcement learning(RL)to gather valuable information.Addi-tionally,due to the challenge posed by sparse rewards,the planning phase of reinforcement learning consumes a considerable amount of time on repetitive and unproductive tasks before adequately ac-cessing sparse reward signals.To address these challenges,this work proposes a space partitioning and reverse merging(SPaRM)framework based on reward-free exploration(RFE).The framework consists of two parts:the space partitioning module and the reverse merging module.The former module partitions the entire state space into a specific number of subspaces to expedite the explora-tion phase.This work establishes its theoretical sample complexity lower bound.The latter module starts planning in reverse from near the target and gradually extends to the starting state,as opposed to the conventional practice of starting at the beginning.This facilitates the early involvement of sparse rewards at the target in the policy update process.This work designs two experimental envi-ronments:a complex maze and a set of randomly generated maps.Compared with two state-of-the-art(SOTA)algorithms,experimental results validate the effectiveness and superior performance of the proposed algorithm.展开更多
Grassland ecological protection compensation and reward policy is the largest-scale investment concerning themost extensive areas since foundation of the PRC. It will be the long-term implementationpolicy for grasslan...Grassland ecological protection compensation and reward policy is the largest-scale investment concerning themost extensive areas since foundation of the PRC. It will be the long-term implementationpolicy for grassland ecological protection. In this study,based on macro-perspective, the policy effects ofgrasslandproductivity, ecological protection, animal husbandryoutput, pastoralists' income were ana- lyzed. The resultsshow that, afterimplementation of the policy, naturalgrass production and grasslandtheoretical stocking rateincreased. The averagenatural grasslandlivestockoverloading ratedecreased significantly, comprehensivenationalgrasslandvegetation coverageis increasing. Besides, adult cattleandbeef yield arefluctuated. Sheep head, adult sheep, sheep production, milk productionincreasedin varying degrees. The per capita netincomeof farmers and pastoralists, livestock income, the proportion oflivestockincomewere higher than those beforeimplementation of the policy.展开更多
The orbitofrontal cortex (OFC) is particularly important for the neural representation of reward value. Previous studies indicated that electroencephalogram (EEG) activity in the OFC was involved in drug administr...The orbitofrontal cortex (OFC) is particularly important for the neural representation of reward value. Previous studies indicated that electroencephalogram (EEG) activity in the OFC was involved in drug administration and withdrawal. The present study investigated EEG activity in the OFC in rats during the development of food reward and craving. Two environments were used separately for control and food-related EEG recordings. In the food-related environment rats were first trained to eat chocolate peanuts; then they either had no access to this food, but could see and smell it (craving trials), or had free access to this food (reward trials). The EEG in the left OFC was recorded during these trials. We showed that, in the food-related environment the EEG activity peaking in the delta band (2-4 Hz) was significantly correlated with the stimulus, increasing during food reward and decreasing during food craving when compared with that in the control environment. Our data suggests that EEG activity in the OFC can be altered by food reward; moreover, delta rhythm in this region could be used as an index monitoring changed signal underlying this reward.展开更多
There is no question that learning a foreign language like English is different from learning other subjects, mainly because it is new to us Chinese and there is no enough environment. But that doesn’t mean we have n...There is no question that learning a foreign language like English is different from learning other subjects, mainly because it is new to us Chinese and there is no enough environment. But that doesn’t mean we have no way to learn it and do it well .If asked to identify the most powerful influences on learning, motivation would probably be high on most teachers’ and learners’ lists. It seems only sensible to assume that English learning is most likely to occur when the learners want to learn. That is, when motivation such as interest, curiosity, or a desire achieves, the learners would be engaged in learning. However, how do we teachers motivate our students to like learning and learn well? Here, rewards both extrinsic and intrinsic are of great value and play a vital role in English learning.展开更多
Objective To investigate the co-effect of Demand-control-support (DCS) model and Effort-reward Imbalance (ERI) model on the risk estimation of depression in humans in comparison with the effects when they are used...Objective To investigate the co-effect of Demand-control-support (DCS) model and Effort-reward Imbalance (ERI) model on the risk estimation of depression in humans in comparison with the effects when they are used respectively. Methods A total of 3 632 males and 1 706 females from 13 factories and companies in Henan province were recruited in this cross-sectional study. Perceived job stress was evaluated with the Job Content Questionnaire and Effort-Reward Imbalance Questionnaire (Chinese version). Depressive symptoms were assessed by using the Center for Epidemiological Studies Depression Scale (CES-D). Results DC (demands/job control ratio) and ERI were shown to be independently associated with depressive symptoms. The outcome of low social support and overcommitment were similar. High DC and low social support (SS), high ERI and high overcommitment, and high DC and high ERI posed greater risks of depressive symptoms than each of them did alone. ERI model and SS model seem to be effective in estimating the risk of depressive symptoms if they are used respectively. Conclusion The DC had better performance when it was used in combination with low SS. The effect on physical demands was better than on psychological demands. The combination of DCS and ERI models could improve the risk estimate of depressive symptoms in humans.展开更多
Reward-based decision-making has been found to activate several brain areas, including the ven- trolateral prefronta~ lobe, orbitofrontal cortex, anterior cingulate cortex, ventral striatum, and mesolimbic dopaminergi...Reward-based decision-making has been found to activate several brain areas, including the ven- trolateral prefronta~ lobe, orbitofrontal cortex, anterior cingulate cortex, ventral striatum, and mesolimbic dopaminergic system. In this study, we observed brain areas activated under three de- grees of uncertainty in a reward-based decision-making task (certain, risky, and ambiguous). The tasks were presented using a brain function audiovisual stimulation system. We conducted brain scans of 15 healthy volunteers using a 3.0T magnetic resonance scanner. We used SPM8 to ana- lyze the location and intensity of activation during the reward-based decision-making task, with re- spect to the three conditions. We found that the orbitofrontal cortex was activated in the certain reward condition, while the prefrontal cortex, precentral gyrus, occipital visual cortex, inferior parietal lobe, cerebellar posterior lobe, middle temporal gyrus, inferior temporal gyrus, limbic lobe, and midbrain were activated during the 'risk' condition. The prefrontal cortex, temporal pole, inferior temporal gyrus, occipital visual cortex, and cerebellar posterior lobe were activated during am- biguous decision-making. The ventrolateral prefrontal lobe, frontal pole of the prefrontal lobe, orbi- tofrontal cortex, precentral gyrus, inferior temporal gyrus, fusiform gyrus, supramarginal gyrus, infe- rior parietal Iobule, and cerebellar posterior lobe exhibited greater activation in the 'risk' than in the 'certain' condition (P 〈 0.05). The frontal pole and dorsolateral region of the prefrontal lobe, as well as the cerebellar posterior lobe, showed significantly greater activation in the 'ambiguous' condition compared to the 'risk' condition (P 〈 0.05). The prefrontal lobe, occipital lobe, parietal lobe, temporal lobe, limbic lobe, midbrain, and posterior lobe of the cerebellum were activated during deci- sion-making about uncertain rewards. Thus, we observed different levels and regions of activation for different types of reward processing during decision-making. Specifically, when the degree of reward uncertainty increased, the number of activated brain areas increased, including greater ac- tivation of brain areas associated with loss.展开更多
In order to make strategic decision on firms’ sharing reward program( SRP), a nested Stackelberg game is developed. The sharing behavior among users and the rewarding strategy of firms are modeled. The optimal sharin...In order to make strategic decision on firms’ sharing reward program( SRP), a nested Stackelberg game is developed. The sharing behavior among users and the rewarding strategy of firms are modeled. The optimal sharing bonus is worked out and the impact of social relationships among customers is discussed. The results show that the higher the bonus,the more efforts the inductor is willing to make to persuade the inductee into buying. In addition,the firms should take the social relationship into consideration when setting the optimal sharing bonus. If the social relationship is weak,there is no need to adopt the SRP. Otherwise,there are two ways to reward the inductors. Also,the stronger the social relationship,the fewer the sharing bonuses that should be offered to the inductors,and the higher the expected profits. As a result,it is reasonable for the firms to implement SRPs on the social media where users are familiar with each other.展开更多
In the world, most of the successes are results of longterm efforts. The reward of success is extremely high, but before that, a long-term investment process is required. People who are “myopic” only value short-ter...In the world, most of the successes are results of longterm efforts. The reward of success is extremely high, but before that, a long-term investment process is required. People who are “myopic” only value short-term rewards and are unwilling to make early-stage investments, so they hardly get the ultimate success and the corresponding high rewards. Similarly, for a reinforcement learning(RL) model with long-delay rewards, the discount rate determines the strength of agent’s “farsightedness”.In order to enable the trained agent to make a chain of correct choices and succeed finally, the feasible region of the discount rate is obtained through mathematical derivation in this paper firstly. It satisfies the “farsightedness” requirement of agent. Afterwards, in order to avoid the complicated problem of solving implicit equations in the process of choosing feasible solutions,a simple method is explored and verified by theoreti cal demonstration and mathematical experiments. Then, a series of RL experiments are designed and implemented to verify the validity of theory. Finally, the model is extended from the finite process to the infinite process. The validity of the extended model is verified by theories and experiments. The whole research not only reveals the significance of the discount rate, but also provides a theoretical basis as well as a practical method for the choice of discount rate in future researches.展开更多
In this study, a T-maze-based frustration model in rats was established using sucrose-reward deprivation, The results revealed that rats maintained a 75% preference for the sucrose-reward arm in the reward phase. Duri...In this study, a T-maze-based frustration model in rats was established using sucrose-reward deprivation, The results revealed that rats maintained a 75% preference for the sucrose-reward arm in the reward phase. During the sucrose-deprivation frustration phase, both the preference for the sucrose-deprivation arm (62.5%) and time spent waiting in the sucrose-deprivation arm decreased. Acute injection of morphine increased the preference in a dose-dependent fashion, and prolonged the waiting duration in the sucrose-deprivation arm. These findings indicate that morphine specifically inhibited the frustration response induced by sucrose reward deprivation. To further elucidate the pharmacological mechanisms involved, the opioid receptor antagonist naloxone was given to model rats prior to the injection of morphine. The results revealed that naloxone administration markedly attenuated the anti-frustration-like effects of 3 mg/kg morphine treatment. These findings suggest that morphine attenuates the frustration-like response to reward deprivation in rats through the opioid receptor.展开更多
OBJECTIVE Glutamatergic projections from prefrontal cortex(PFc) to nucleus accumbens(NAc) regulate the dopamine(DA) release in NAc.However,it is not clear whether this circuit is effective for the reward and motivatio...OBJECTIVE Glutamatergic projections from prefrontal cortex(PFc) to nucleus accumbens(NAc) regulate the dopamine(DA) release in NAc.However,it is not clear whether this circuit is effective for the reward and motivation of heroin addiction.Our study investigates the effects of metabotropic glutamate receptor 2/3(mGluR2/3) and the projections from ventromedial prefrontal cortex(vmPFc) to the NAc shell on the reward and motivation of heroin-addicted rats.METHODS First,rats were trained to selfadministration for 14 d.On the 15 thday,parts of rats were injected with mGluR 2/3 agonist LY379268(0.1,0.3 and 1.0 mg·kg-1,ip) systematically and another parts of rats were bilaterally microinjected with LY379268(0.3 and 1.0 g·L^(-1))at the volume of 0.5 μL into the ventral tegmental area(VTA),NAc core or NAc shell,respectively.All rats were followed by heroin self-administration testing under fixed ratio 1(FR1) schedule or progressed ratio(PR) schedule to observe the effect of LY379268 on the heroin reward or motivation.Second,rats were injected chemogenetic glutamatergic virus(pAAV-CaMKIIa-hM3 D(Gq)-mCherry or pAOV-CaMKIIa-hM4 D(Gi)-mCherry-3 Flag) or negative control virus in vmPFc,and trained to heroin self-administration for 14 d.On the 15 thday,rats were bilateral y microinjected with clozapine-N-oxide(CNO,1 mmol·L^(-1),0.5 μL) into NAc shell and tested the effect on the heroin reward or motivation.Finally,rats were injected optogenetical glutamatergic virus(AAV2/9-CaM KⅡ-hChR2-EYFP) or negative control virus in vmPFc,implanted 16 channel photoelectrode in ipsilateral NAc shell,and trained to heroin selfadministration for 14 d.On the 15 thday,rats were tested heroin reward under FR1 procedure with blue light stimulation in the wavelength of470 nm,frequency of 25 HZ and power of 5 mW.Each stimulation lasting for 1 h and interval for1 h.The spike changes before and after stimulation in NAc Shel neural nerve was recorded.RESULTS LY379268 cloud dose-dependent attenuated the heroin reward or motivation and the local effective site was mainly in the NAc shell.Chemogenetic results showed activation or inactivation the projection from vmPFc to NAc shell enhanced or attenuated the heroin reward and motivation,respectively.Optogenetical stimulation the same projection also enhanced the heroin reward,and a tonic neuronal firing at the nerve of NAc shell was observed during the light stimulation session.CONCLUSION mGluR2/3 activation in the NAc shell is involved in the inhibition of heroin reward and motivation.Activation the projection from PFc to NAc shell can enhance the effects on heroin reward and motivation.展开更多
Different from the fact that the main researches are focused on single futures contract and lack of the comparison of different periods, this paper described the statistical characteristics of wheat futures reward tim...Different from the fact that the main researches are focused on single futures contract and lack of the comparison of different periods, this paper described the statistical characteristics of wheat futures reward time series of Zhengzhou Commodity Exchange in recent three years. Besides the basic statistic analysis, the paper used the GARCH and EGARCH model to describe the time series which had the ARCH effect and analyzed the persistence of volatility shocks and the leverage effect. The results showed that compared with that of normal one,wheat futures reward series were abnormality, leptokurtic and thick tail distribution. The study also found that two-part of the reward series had no autocorrelation. Among the six correlative series, three ones presented the ARCH effect. By using of the Auto-regressive Distributed Lag Model, GARCH model and EGARCH model, the paper demonstrates the persistence of volatility shocks and the leverage effect on the wheat futures reward time series. The results reveal that on the one hand, the statistical characteristics of the wheat futures reward are similar to the aboard mature futures market as a whole. But on the other hand, the results reflect some shortages such as the immatureness and the over-control by the government in the Chinese future market.展开更多
基金the National Natural Science Foundation of China(Grant No.62062049)the Social Science Project of the Ministry of Education of China(Grant No.20YJCZH212)the Natural Science Foundation of Gansu Province,China(Grant No.20JR5RA390).
文摘To study the incentive mechanisms of cooperation, we propose a preference rewarding mechanism in the spatial prisoner’s dilemma game, which simultaneously considers reputational preference, other-regarding preference and the dynamic adjustment of vertex weight. The vertex weight of a player is adaptively adjusted according to the comparison result of his own reputation and the average reputation value of his immediate neighbors. Players are inclined to pay a personal cost to reward the cooperative neighbor with the greatest vertex weight. The vertex weight of a player is proportional to the preference rewards he can obtain from direct neighbors. We find that the preference rewarding mechanism significantly facilitates the evolution of cooperation, and the dynamic adjustment of vertex weight has powerful effect on the emergence of cooperative behavior. To validate multiple effects, strategy distribution and the average payoff and fitness of players are discussed in a microcosmic view.
文摘This work aims to identify a method by the coordinator of the OU(operational unit)for the training of gratified personnel through the use of a rewarding system.The continuous transformations that concern the Italian healthcare scene lead the operators to face always new needs and problems.Professionals can not only be considered as workers but bearers of qualified intellectual,professional and cultural skills.Individual coordinators are required to be real leaders within their operational units and to use their managerial skills in achieving company objectives and in evaluating the personnel they manage.The main factor to which difficulties in the management of staff are related concerns the motivation,defined as a state of mind together with aspirations,needs,orientations,that pushes people to act and to use a behavior characterized by commitment,perseverance and determination.The need to better rationalize the resources available,to promote high quality health care,improving safety,efficiency and appropriateness has led the general management and coordinator of the OU to use the reward systems.With the introduction of this procedure aimed at enhancing the merit and encouraging virtuous behavior during the provision of health services,the public employment reform participates in the evolution of the regulatory framework and it turns on the change that is taking place in the world of work.
基金the National Natural Science Foun-dation of China(Grant No.71961003).
文摘In public goods games, punishments and rewards have been shown to be effective mechanisms for maintaining individualcooperation. However, punishments and rewards are costly to incentivize cooperation. Therefore, the generation ofcostly penalties and rewards has been a complex problem in promoting the development of cooperation. In real society,specialized institutions exist to punish evil people or reward good people by collecting taxes. We propose a strong altruisticpunishment or reward strategy in the public goods game through this phenomenon. Through theoretical analysis and numericalcalculation, we can get that tax-based strong altruistic punishment (reward) has more evolutionary advantages thantraditional strong altruistic punishment (reward) in maintaining cooperation and tax-based strong altruistic reward leads toa higher level of cooperation than tax-based strong altruistic punishment.
文摘Network marketing is a trading technique that provides companies with the opportunity to increase sales.With the increasing number of Internet-based purchases,several threats are increasingly observed in this field,such as user privacy violations,company owner(CO)fraud,the changing of sold products’information,and the scalability of selling networks.This study presents the concept of a blockchain-based market called ACR-MLM that functions based on the multi-level marketing(MLM)model,through which registered users receive anonymous and confidential rewards for their own and their subgroups’sales.Applying a public blockchain as the ACR-MLM framework’s infrastructure solves existing problems in MLM-based markets,such as CO fraud(against the government or its users),user privacy violations(obtaining their real names or subgroup users),and scalability(when vast numbers of users have been registered).To provide confidentiality and scalability to the ACR-MLM framework,hierarchical identity-based encryption(HIBE)was applied with a functional encryption(FE)scheme.Finally,the security of ACR-MLM is analyzed using the random oracle(RO)model and then evaluated.
文摘The CAS Institute of Modern Physics is a center of pure basic research concerning nuclear physics, accelerator physics and related technology. In recent years, it succeeded in the construction of China’s first production line for manufacturing radiation-crosslinked (RC) wire and cable with the aid of international cooperation,achieving rewarding benefits from it.
基金funded by National Natural Science Foundation of China(No.62063006)Guangxi Science and Technology Major Program(No.2022AA05002)+1 种基金Key Laboratory of AI and Information Processing(Hechi University),Education Department of Guangxi Zhuang Autonomous Region(No.2022GXZDSY003)Central Leading Local Science and Technology Development Fund Project of Wuzhou(No.202201001).
文摘By integrating deep neural networks with reinforcement learning,the Double Deep Q Network(DDQN)algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning of mobile robots.However,the traditional DDQN algorithm suffers from sparse rewards and inefficient utilization of high-quality data.Targeting those problems,an improved DDQN algorithm based on average Q-value estimation and reward redistribution was proposed.First,to enhance the precision of the target Q-value,the average of multiple previously learned Q-values from the target Q network is used to replace the single Q-value from the current target Q network.Next,a reward redistribution mechanism is designed to overcome the sparse reward problem by adjusting the final reward of each action using the round reward from trajectory information.Additionally,a reward-prioritized experience selection method is introduced,which ranks experience samples according to reward values to ensure frequent utilization of high-quality data.Finally,simulation experiments are conducted to verify the effectiveness of the proposed algorithm in fixed-position scenario and random environments.The experimental results show that compared to the traditional DDQN algorithm,the proposed algorithm achieves shorter average running time,higher average return and fewer average steps.The performance of the proposed algorithm is improved by 11.43%in the fixed scenario and 8.33%in random environments.It not only plans economic and safe paths but also significantly improves efficiency and generalization in path planning,making it suitable for widespread application in autonomous navigation and industrial automation.
文摘Ⅰ. THE SUGGESTION OF THE STRATEGIC MEASURE Situated at the junction between the vast Eurasian landmass and the south Asian subcontinent, Yunnan Prov-
基金Supported by the International Partnership Program of Chinese Academy of Sciences(No.184131KYSB20200033).
文摘Due to the issue of long-horizon,a substantial number of visits to the state space is required during the exploration phase of reinforcement learning(RL)to gather valuable information.Addi-tionally,due to the challenge posed by sparse rewards,the planning phase of reinforcement learning consumes a considerable amount of time on repetitive and unproductive tasks before adequately ac-cessing sparse reward signals.To address these challenges,this work proposes a space partitioning and reverse merging(SPaRM)framework based on reward-free exploration(RFE).The framework consists of two parts:the space partitioning module and the reverse merging module.The former module partitions the entire state space into a specific number of subspaces to expedite the explora-tion phase.This work establishes its theoretical sample complexity lower bound.The latter module starts planning in reverse from near the target and gradually extends to the starting state,as opposed to the conventional practice of starting at the beginning.This facilitates the early involvement of sparse rewards at the target in the policy update process.This work designs two experimental envi-ronments:a complex maze and a set of randomly generated maps.Compared with two state-of-the-art(SOTA)algorithms,experimental results validate the effectiveness and superior performance of the proposed algorithm.
基金Supported by National Natural Science Funds of China(71503251)The Agricultural Science and Technology Innovation Program(ASTIP-IAED-2015-01)China forage and grass research system(CARS-35-22)~~
文摘Grassland ecological protection compensation and reward policy is the largest-scale investment concerning themost extensive areas since foundation of the PRC. It will be the long-term implementationpolicy for grassland ecological protection. In this study,based on macro-perspective, the policy effects ofgrasslandproductivity, ecological protection, animal husbandryoutput, pastoralists' income were ana- lyzed. The resultsshow that, afterimplementation of the policy, naturalgrass production and grasslandtheoretical stocking rateincreased. The averagenatural grasslandlivestockoverloading ratedecreased significantly, comprehensivenationalgrasslandvegetation coverageis increasing. Besides, adult cattleandbeef yield arefluctuated. Sheep head, adult sheep, sheep production, milk productionincreasedin varying degrees. The per capita netincomeof farmers and pastoralists, livestock income, the proportion oflivestockincomewere higher than those beforeimplementation of the policy.
基金National Science Foundation of China (3047055330530270+10 种基金30670669 30770700)973 Program (2005CB522803 2007CB947703)863 Program (O7013810 2006AA02A116)The Major State Basic Research of China (2003CB716600)Chinese-Finnish International Collaboration Project-neuro (30621130076)Program of CASC (KSCX1-YW-R-33YZ200737)National Key Technologies R & D Program and Yunnan Science and Technique Program (2006PT08-2)
文摘The orbitofrontal cortex (OFC) is particularly important for the neural representation of reward value. Previous studies indicated that electroencephalogram (EEG) activity in the OFC was involved in drug administration and withdrawal. The present study investigated EEG activity in the OFC in rats during the development of food reward and craving. Two environments were used separately for control and food-related EEG recordings. In the food-related environment rats were first trained to eat chocolate peanuts; then they either had no access to this food, but could see and smell it (craving trials), or had free access to this food (reward trials). The EEG in the left OFC was recorded during these trials. We showed that, in the food-related environment the EEG activity peaking in the delta band (2-4 Hz) was significantly correlated with the stimulus, increasing during food reward and decreasing during food craving when compared with that in the control environment. Our data suggests that EEG activity in the OFC can be altered by food reward; moreover, delta rhythm in this region could be used as an index monitoring changed signal underlying this reward.
文摘There is no question that learning a foreign language like English is different from learning other subjects, mainly because it is new to us Chinese and there is no enough environment. But that doesn’t mean we have no way to learn it and do it well .If asked to identify the most powerful influences on learning, motivation would probably be high on most teachers’ and learners’ lists. It seems only sensible to assume that English learning is most likely to occur when the learners want to learn. That is, when motivation such as interest, curiosity, or a desire achieves, the learners would be engaged in learning. However, how do we teachers motivate our students to like learning and learn well? Here, rewards both extrinsic and intrinsic are of great value and play a vital role in English learning.
基金funded by Henan Provincial Health Science and Technology Key Projects(201001009)National Science and Technology Infrastructure Program(2006BAI06B 08),China
文摘Objective To investigate the co-effect of Demand-control-support (DCS) model and Effort-reward Imbalance (ERI) model on the risk estimation of depression in humans in comparison with the effects when they are used respectively. Methods A total of 3 632 males and 1 706 females from 13 factories and companies in Henan province were recruited in this cross-sectional study. Perceived job stress was evaluated with the Job Content Questionnaire and Effort-Reward Imbalance Questionnaire (Chinese version). Depressive symptoms were assessed by using the Center for Epidemiological Studies Depression Scale (CES-D). Results DC (demands/job control ratio) and ERI were shown to be independently associated with depressive symptoms. The outcome of low social support and overcommitment were similar. High DC and low social support (SS), high ERI and high overcommitment, and high DC and high ERI posed greater risks of depressive symptoms than each of them did alone. ERI model and SS model seem to be effective in estimating the risk of depressive symptoms if they are used respectively. Conclusion The DC had better performance when it was used in combination with low SS. The effect on physical demands was better than on psychological demands. The combination of DCS and ERI models could improve the risk estimate of depressive symptoms in humans.
基金supported by the Science and Technology Development Project of Shandong Province,China,No.2011YD18045the Natural Science Foundation of Shandong Province,China,No.ZR2012HM049+3 种基金the Health Care Foundation Program of Shandong Province,China,No.2007BZ19the Foundation Program of Technology Bureau of Qingdao,ChinaNo.Kzd-0309-1-1-33-nsh
文摘Reward-based decision-making has been found to activate several brain areas, including the ven- trolateral prefronta~ lobe, orbitofrontal cortex, anterior cingulate cortex, ventral striatum, and mesolimbic dopaminergic system. In this study, we observed brain areas activated under three de- grees of uncertainty in a reward-based decision-making task (certain, risky, and ambiguous). The tasks were presented using a brain function audiovisual stimulation system. We conducted brain scans of 15 healthy volunteers using a 3.0T magnetic resonance scanner. We used SPM8 to ana- lyze the location and intensity of activation during the reward-based decision-making task, with re- spect to the three conditions. We found that the orbitofrontal cortex was activated in the certain reward condition, while the prefrontal cortex, precentral gyrus, occipital visual cortex, inferior parietal lobe, cerebellar posterior lobe, middle temporal gyrus, inferior temporal gyrus, limbic lobe, and midbrain were activated during the 'risk' condition. The prefrontal cortex, temporal pole, inferior temporal gyrus, occipital visual cortex, and cerebellar posterior lobe were activated during am- biguous decision-making. The ventrolateral prefrontal lobe, frontal pole of the prefrontal lobe, orbi- tofrontal cortex, precentral gyrus, inferior temporal gyrus, fusiform gyrus, supramarginal gyrus, infe- rior parietal Iobule, and cerebellar posterior lobe exhibited greater activation in the 'risk' than in the 'certain' condition (P 〈 0.05). The frontal pole and dorsolateral region of the prefrontal lobe, as well as the cerebellar posterior lobe, showed significantly greater activation in the 'ambiguous' condition compared to the 'risk' condition (P 〈 0.05). The prefrontal lobe, occipital lobe, parietal lobe, temporal lobe, limbic lobe, midbrain, and posterior lobe of the cerebellum were activated during deci- sion-making about uncertain rewards. Thus, we observed different levels and regions of activation for different types of reward processing during decision-making. Specifically, when the degree of reward uncertainty increased, the number of activated brain areas increased, including greater ac- tivation of brain areas associated with loss.
基金The National Social Science Foundation of China(No.17BGL196)the Postgraduate Research&Practice Innovation Program of Jiangsu Province(No.KYLX15_0193)
文摘In order to make strategic decision on firms’ sharing reward program( SRP), a nested Stackelberg game is developed. The sharing behavior among users and the rewarding strategy of firms are modeled. The optimal sharing bonus is worked out and the impact of social relationships among customers is discussed. The results show that the higher the bonus,the more efforts the inductor is willing to make to persuade the inductee into buying. In addition,the firms should take the social relationship into consideration when setting the optimal sharing bonus. If the social relationship is weak,there is no need to adopt the SRP. Otherwise,there are two ways to reward the inductors. Also,the stronger the social relationship,the fewer the sharing bonuses that should be offered to the inductors,and the higher the expected profits. As a result,it is reasonable for the firms to implement SRPs on the social media where users are familiar with each other.
基金supported by the National Natural Science Foundation of China (717712167170120972001214)。
文摘In the world, most of the successes are results of longterm efforts. The reward of success is extremely high, but before that, a long-term investment process is required. People who are “myopic” only value short-term rewards and are unwilling to make early-stage investments, so they hardly get the ultimate success and the corresponding high rewards. Similarly, for a reinforcement learning(RL) model with long-delay rewards, the discount rate determines the strength of agent’s “farsightedness”.In order to enable the trained agent to make a chain of correct choices and succeed finally, the feasible region of the discount rate is obtained through mathematical derivation in this paper firstly. It satisfies the “farsightedness” requirement of agent. Afterwards, in order to avoid the complicated problem of solving implicit equations in the process of choosing feasible solutions,a simple method is explored and verified by theoreti cal demonstration and mathematical experiments. Then, a series of RL experiments are designed and implemented to verify the validity of theory. Finally, the model is extended from the finite process to the infinite process. The validity of the extended model is verified by theories and experiments. The whole research not only reveals the significance of the discount rate, but also provides a theoretical basis as well as a practical method for the choice of discount rate in future researches.
基金the National Natural Science Foundation of China,No.30870894the National Basic Research Program of China,No.2009CB522000National Key Technology R&D Program in the 12th Five-Year Plan of China,No.2011BAK04B08
文摘In this study, a T-maze-based frustration model in rats was established using sucrose-reward deprivation, The results revealed that rats maintained a 75% preference for the sucrose-reward arm in the reward phase. During the sucrose-deprivation frustration phase, both the preference for the sucrose-deprivation arm (62.5%) and time spent waiting in the sucrose-deprivation arm decreased. Acute injection of morphine increased the preference in a dose-dependent fashion, and prolonged the waiting duration in the sucrose-deprivation arm. These findings indicate that morphine specifically inhibited the frustration response induced by sucrose reward deprivation. To further elucidate the pharmacological mechanisms involved, the opioid receptor antagonist naloxone was given to model rats prior to the injection of morphine. The results revealed that naloxone administration markedly attenuated the anti-frustration-like effects of 3 mg/kg morphine treatment. These findings suggest that morphine attenuates the frustration-like response to reward deprivation in rats through the opioid receptor.
基金National Basic Research Program of China(2015CB553504)National Natural Science Foundationof China (81471350+1 种基金81671321)Natural Science Foundation of Ningbo Municipality,Zhejiang Province, China (2017A610214).
文摘OBJECTIVE Glutamatergic projections from prefrontal cortex(PFc) to nucleus accumbens(NAc) regulate the dopamine(DA) release in NAc.However,it is not clear whether this circuit is effective for the reward and motivation of heroin addiction.Our study investigates the effects of metabotropic glutamate receptor 2/3(mGluR2/3) and the projections from ventromedial prefrontal cortex(vmPFc) to the NAc shell on the reward and motivation of heroin-addicted rats.METHODS First,rats were trained to selfadministration for 14 d.On the 15 thday,parts of rats were injected with mGluR 2/3 agonist LY379268(0.1,0.3 and 1.0 mg·kg-1,ip) systematically and another parts of rats were bilaterally microinjected with LY379268(0.3 and 1.0 g·L^(-1))at the volume of 0.5 μL into the ventral tegmental area(VTA),NAc core or NAc shell,respectively.All rats were followed by heroin self-administration testing under fixed ratio 1(FR1) schedule or progressed ratio(PR) schedule to observe the effect of LY379268 on the heroin reward or motivation.Second,rats were injected chemogenetic glutamatergic virus(pAAV-CaMKIIa-hM3 D(Gq)-mCherry or pAOV-CaMKIIa-hM4 D(Gi)-mCherry-3 Flag) or negative control virus in vmPFc,and trained to heroin self-administration for 14 d.On the 15 thday,rats were bilateral y microinjected with clozapine-N-oxide(CNO,1 mmol·L^(-1),0.5 μL) into NAc shell and tested the effect on the heroin reward or motivation.Finally,rats were injected optogenetical glutamatergic virus(AAV2/9-CaM KⅡ-hChR2-EYFP) or negative control virus in vmPFc,implanted 16 channel photoelectrode in ipsilateral NAc shell,and trained to heroin selfadministration for 14 d.On the 15 thday,rats were tested heroin reward under FR1 procedure with blue light stimulation in the wavelength of470 nm,frequency of 25 HZ and power of 5 mW.Each stimulation lasting for 1 h and interval for1 h.The spike changes before and after stimulation in NAc Shel neural nerve was recorded.RESULTS LY379268 cloud dose-dependent attenuated the heroin reward or motivation and the local effective site was mainly in the NAc shell.Chemogenetic results showed activation or inactivation the projection from vmPFc to NAc shell enhanced or attenuated the heroin reward and motivation,respectively.Optogenetical stimulation the same projection also enhanced the heroin reward,and a tonic neuronal firing at the nerve of NAc shell was observed during the light stimulation session.CONCLUSION mGluR2/3 activation in the NAc shell is involved in the inhibition of heroin reward and motivation.Activation the projection from PFc to NAc shell can enhance the effects on heroin reward and motivation.
文摘Different from the fact that the main researches are focused on single futures contract and lack of the comparison of different periods, this paper described the statistical characteristics of wheat futures reward time series of Zhengzhou Commodity Exchange in recent three years. Besides the basic statistic analysis, the paper used the GARCH and EGARCH model to describe the time series which had the ARCH effect and analyzed the persistence of volatility shocks and the leverage effect. The results showed that compared with that of normal one,wheat futures reward series were abnormality, leptokurtic and thick tail distribution. The study also found that two-part of the reward series had no autocorrelation. Among the six correlative series, three ones presented the ARCH effect. By using of the Auto-regressive Distributed Lag Model, GARCH model and EGARCH model, the paper demonstrates the persistence of volatility shocks and the leverage effect on the wheat futures reward time series. The results reveal that on the one hand, the statistical characteristics of the wheat futures reward are similar to the aboard mature futures market as a whole. But on the other hand, the results reflect some shortages such as the immatureness and the over-control by the government in the Chinese future market.