Traditionally Chinese people place much value in virtue, with a long-held belief that one should never appropriate valuable items lost by others. However, a recent regulation by the government of south China’s Guangd...Traditionally Chinese people place much value in virtue, with a long-held belief that one should never appropriate valuable items lost by others. However, a recent regulation by the government of south China’s Guangdong Province展开更多
The city of Suqian in east China’s Jiangsu Province,put a new regulation into practice six months ago whereby police authorities reward residents who volunteer to make peace in neighborhood quarrels,mediate in civil ...The city of Suqian in east China’s Jiangsu Province,put a new regulation into practice six months ago whereby police authorities reward residents who volunteer to make peace in neighborhood quarrels,mediate in civil disputes or,for that matter,help in putting out fires.展开更多
Saving people in distress can now bring a good Samaritan big bucks. The govemment of Guangzhou, Guangdong Province,announced in October that the maximum reward to people who risk their lives to save the lives and prop...Saving people in distress can now bring a good Samaritan big bucks. The govemment of Guangzhou, Guangdong Province,announced in October that the maximum reward to people who risk their lives to save the lives and property of others,whether civilians or civil ser- vants,would be raised from 50,000 yuan ($6,667)to 300,000($40,000)yuan.The high-展开更多
At the end of September, banners carrying the slogan "Catch one thief, get 1,000yuan" appeared in Gulou District of Fuzhou,capital of southeast China’s Fujian Province.According to local officials, the goal of
Mobile adhoc networks have grown in prominence in recent years,and they are now utilized in a broader range of applications.The main challenges are related to routing techniques that are generally employed in them.Mob...Mobile adhoc networks have grown in prominence in recent years,and they are now utilized in a broader range of applications.The main challenges are related to routing techniques that are generally employed in them.Mobile Adhoc system management,on the other hand,requires further testing and improvements in terms of security.Traditional routing protocols,such as Adhoc On-Demand Distance Vector(AODV)and Dynamic Source Routing(DSR),employ the hop count to calculate the distance between two nodes.The main aim of this research work is to determine the optimum method for sending packets while also extending life time of the network.It is achieved by changing the residual energy of each network node.Also,in this paper,various algorithms for optimal routing based on parameters like energy,distance,mobility,and the pheromone value are proposed.Moreover,an approach based on a reward and penalty system is given in this paper to evaluate the efficiency of the proposed algorithms under the impact of parameters.The simulation results unveil that the reward penalty-based approach is quite effective for the selection of an optimal path for routing when the algorithms are implemented under the parameters of interest,which helps in achieving less packet drop and energy consumption of the nodes along with enhancing the network efficiency.展开更多
The blades of wind turbines located at high latitudes are often covered with ice in late autumn and winter,where this affects their capacity for power generation as well as their safety.Accurately identifying the icin...The blades of wind turbines located at high latitudes are often covered with ice in late autumn and winter,where this affects their capacity for power generation as well as their safety.Accurately identifying the icing of the blades of wind turbines in remote areas is thus important,and a general model is needed to this end.This paper proposes a universal model based on a Deep Neural Network(DNN)that uses data from the Supervisory Control and Data Acquisition(SCADA)system.Two datasets from SCADA are first preprocessed through undersampling,that is,they are labeled,normalized,and balanced.The features of icing of the blades of a turbine identified in previous studies are then used to extract training data from the training dataset.A middle feature is proposed to show how a given feature is correlated with icing on the blade.Performance indicators for the model,including a reward function,are also designed to assess its predictive accuracy.Finally,the most suitable model is used to predict the testing data,and values of the reward function and the predictive accuracy of the model are calculated.The proposed method can be used to relate continuously transferred features with a binary status of icing of the blades of the turbine by using variables of the middle feature.The results here show that an integrated indicator systemis superior to a single indicator of accuracy when evaluating the prediction model.展开更多
Goal-conditioned reinforcement learning(RL)is an interesting extension of the traditional RL framework,where the dynamic environment and reward sparsity can cause conventional learning algorithms to fail.Reward shapin...Goal-conditioned reinforcement learning(RL)is an interesting extension of the traditional RL framework,where the dynamic environment and reward sparsity can cause conventional learning algorithms to fail.Reward shaping is a practical approach to improving sample efficiency by embedding human domain knowledge into the learning process.Existing reward shaping methods for goal-conditioned RL are typically built on distance metrics with a linear and isotropic distribution,which may fail to provide sufficient information about the ever-changing environment with high complexity.This paper proposes a novel magnetic field-based reward shaping(MFRS)method for goal-conditioned RL tasks with dynamic target and obstacles.Inspired by the physical properties of magnets,we consider the target and obstacles as permanent magnets and establish the reward function according to the intensity values of the magnetic field generated by these magnets.The nonlinear and anisotropic distribution of the magnetic field intensity can provide more accessible and conducive information about the optimization landscape,thus introducing a more sophisticated magnetic reward compared to the distance-based setting.Further,we transform our magnetic reward to the form of potential-based reward shaping by learning a secondary potential function concurrently to ensure the optimal policy invariance of our method.Experiments results in both simulated and real-world robotic manipulation tasks demonstrate that MFRS outperforms relevant existing methods and effectively improves the sample efficiency of RL algorithms in goal-conditioned tasks with various dynamics of the target and obstacles.展开更多
The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents ...The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents can explore and summarize the environment to achieve autonomous deci-sions in the continuous state space and action space.In this paper,a cooperative defense with DDPG via swarms of unmanned aerial vehicle(UAV)is developed and validated,which has shown promising practical value in the effect of defending.We solve the sparse rewards problem of reinforcement learning pair in a long-term task by building the reward function of UAV swarms and optimizing the learning process of artificial neural network based on the DDPG algorithm to reduce the vibration in the learning process.The experimental results show that the DDPG algorithm can guide the UAVs swarm to perform the defense task efficiently,meeting the requirements of a UAV swarm for non-centralization,autonomy,and promoting the intelligent development of UAVs swarm as well as the decision-making process.展开更多
As assessment outcomes provide students with a sense of accomplishment that is boosted by the reward system,learning becomes more effective.This research aims to determine the effects of reward system prior to assessm...As assessment outcomes provide students with a sense of accomplishment that is boosted by the reward system,learning becomes more effective.This research aims to determine the effects of reward system prior to assessment in Mathematics.Quasi-experimental research design was used to examine whether there was a significant difference between the use of reward system and students’level of performance in Mathematics.Through purposive sampling,the respondents of the study involve 80 Grade 9 students belonging to two sections from Gaudencio B.Lontok Memorial Integrated School.Based on similar demographics and pre-test results,control and study group were involved as participants of the study.Data were treated and analyzed accordingly using statistical treatments such as mean and t-test for independent variables.There was a significant finding revealing the advantage of using the reward system compare to the non-reward system in increasing students’level of performance in Mathematics.It is concluded that the use of reward system is effective in improving the assessment outcomes in Mathematics.It is recommended to use reward system for persistent assessment outcomes prior to assessment,to be a reflection of the intended outcomes in Mathematics.展开更多
一、薪金1.wage工资,工钱。一般按每小时、每天或每周计算,以蓝领工人、半技术工人为对象,通常会给现金;此外,该词还用来泛指工资这一概念。使用时多用复数形式。例: (1)The postal workers have asked for wage rise of$5 a week.邮...一、薪金1.wage工资,工钱。一般按每小时、每天或每周计算,以蓝领工人、半技术工人为对象,通常会给现金;此外,该词还用来泛指工资这一概念。使用时多用复数形式。例: (1)The postal workers have asked for wage rise of$5 a week.邮政工人要求周薪提高5美元。(2)current wage system现行工资制度2.salary多指“月薪”或“年俸”,以公职人员、公司职员、白领职工等为对象,通常通过支票付给。例: The union leaders enjoy great prestige and authorityand large salaries.工会领袖享有很高威望和很大权力,而且领取丰厚的薪水。3.stipend专指酬劳牧师、教师、行政官员的俸给。例如:展开更多
It is necessary to invite the cognitive principle to improve our language teaching method and make the learning process more effectively.Compared with the traditional teacher-centered teaching method,language teaching...It is necessary to invite the cognitive principle to improve our language teaching method and make the learning process more effectively.Compared with the traditional teacher-centered teaching method,language teaching guided by the cognitive principle can change the role of teacher and students.Teachers should pay more attention to form students' autonomy learning ability,increase their motivation and help students find balance between investing efforts and getting results.Meaningful learning should be adopted during teaching and anticipation of reward also works effectively.展开更多
There is no question that learning a foreign language like English is different from learning other subjects, mainly because it is new to us Chinese and there is no enough environment. But that doesn’t mean we have n...There is no question that learning a foreign language like English is different from learning other subjects, mainly because it is new to us Chinese and there is no enough environment. But that doesn’t mean we have no way to learn it and do it well .If asked to identify the most powerful influences on learning, motivation would probably be high on most teachers’ and learners’ lists. It seems only sensible to assume that English learning is most likely to occur when the learners want to learn. That is, when motivation such as interest, curiosity, or a desire achieves, the learners would be engaged in learning. However, how do we teachers motivate our students to like learning and learn well? Here, rewards both extrinsic and intrinsic are of great value and play a vital role in English learning.展开更多
Objective To investigate the co-effect of Demand-control-support (DCS) model and Effort-reward Imbalance (ERI) model on the risk estimation of depression in humans in comparison with the effects when they are used...Objective To investigate the co-effect of Demand-control-support (DCS) model and Effort-reward Imbalance (ERI) model on the risk estimation of depression in humans in comparison with the effects when they are used respectively. Methods A total of 3 632 males and 1 706 females from 13 factories and companies in Henan province were recruited in this cross-sectional study. Perceived job stress was evaluated with the Job Content Questionnaire and Effort-Reward Imbalance Questionnaire (Chinese version). Depressive symptoms were assessed by using the Center for Epidemiological Studies Depression Scale (CES-D). Results DC (demands/job control ratio) and ERI were shown to be independently associated with depressive symptoms. The outcome of low social support and overcommitment were similar. High DC and low social support (SS), high ERI and high overcommitment, and high DC and high ERI posed greater risks of depressive symptoms than each of them did alone. ERI model and SS model seem to be effective in estimating the risk of depressive symptoms if they are used respectively. Conclusion The DC had better performance when it was used in combination with low SS. The effect on physical demands was better than on psychological demands. The combination of DCS and ERI models could improve the risk estimate of depressive symptoms in humans.展开更多
The nucleus accumbens(NAc)is a subcortical brain structure known primarily for its roles in pleasure,reward,and addiction.Despite less focus on the NAc in pain research,it also plays a large role in the mediation of p...The nucleus accumbens(NAc)is a subcortical brain structure known primarily for its roles in pleasure,reward,and addiction.Despite less focus on the NAc in pain research,it also plays a large role in the mediation of pain and is effective as a source of analgesia.Evidence for this involvement lies in the NAc’s cortical connections,functions,pharmacology,and therapeutic targeting.The NAc projects to and receives information from notable pain structures,such as the prefrontal cortex,anterior cingulate cortex,periaqueductal gray,habenula,thalamus,etc.Additionally,the NAc and other pain-modulating structures share functions involving opioid regulation and motivational and emotional processing,which each work beyond simply the rewarding experience of pain offset.Pharmacologically speaking,the NAc responds heavily to painful stimuli,due to its high density ofμopioid receptors and the activation of several different neurotransmitter systems in the NAc,such as opioids,dopamine,calcitonin gene-related peptide,γ-aminobutyric acid,glutamate,and substance P,each of which have been shown to elicit analgesic effects.In both preclinical and clinical models,deep brain stimulation of the NAc has elicited successful analgesia.The multi-functional NAc is important in motivational behavior,and the motivation for avoiding pain is just as important to survival as the motivation for seeking pleasure.It is possible,then,that the NAc must be involved in both pleasure and pain in order to help determine the motivational salience of positive and negative events.展开更多
Reward-based decision-making has been found to activate several brain areas, including the ven- trolateral prefronta~ lobe, orbitofrontal cortex, anterior cingulate cortex, ventral striatum, and mesolimbic dopaminergi...Reward-based decision-making has been found to activate several brain areas, including the ven- trolateral prefronta~ lobe, orbitofrontal cortex, anterior cingulate cortex, ventral striatum, and mesolimbic dopaminergic system. In this study, we observed brain areas activated under three de- grees of uncertainty in a reward-based decision-making task (certain, risky, and ambiguous). The tasks were presented using a brain function audiovisual stimulation system. We conducted brain scans of 15 healthy volunteers using a 3.0T magnetic resonance scanner. We used SPM8 to ana- lyze the location and intensity of activation during the reward-based decision-making task, with re- spect to the three conditions. We found that the orbitofrontal cortex was activated in the certain reward condition, while the prefrontal cortex, precentral gyrus, occipital visual cortex, inferior parietal lobe, cerebellar posterior lobe, middle temporal gyrus, inferior temporal gyrus, limbic lobe, and midbrain were activated during the 'risk' condition. The prefrontal cortex, temporal pole, inferior temporal gyrus, occipital visual cortex, and cerebellar posterior lobe were activated during am- biguous decision-making. The ventrolateral prefrontal lobe, frontal pole of the prefrontal lobe, orbi- tofrontal cortex, precentral gyrus, inferior temporal gyrus, fusiform gyrus, supramarginal gyrus, infe- rior parietal Iobule, and cerebellar posterior lobe exhibited greater activation in the 'risk' than in the 'certain' condition (P 〈 0.05). The frontal pole and dorsolateral region of the prefrontal lobe, as well as the cerebellar posterior lobe, showed significantly greater activation in the 'ambiguous' condition compared to the 'risk' condition (P 〈 0.05). The prefrontal lobe, occipital lobe, parietal lobe, temporal lobe, limbic lobe, midbrain, and posterior lobe of the cerebellum were activated during deci- sion-making about uncertain rewards. Thus, we observed different levels and regions of activation for different types of reward processing during decision-making. Specifically, when the degree of reward uncertainty increased, the number of activated brain areas increased, including greater ac- tivation of brain areas associated with loss.展开更多
A network selection optimization algorithm based on the Markov decision process(MDP)is proposed so that mobile terminals can always connect to the best wireless network in a heterogeneous network environment.Consideri...A network selection optimization algorithm based on the Markov decision process(MDP)is proposed so that mobile terminals can always connect to the best wireless network in a heterogeneous network environment.Considering the different types of service requirements,the MDP model and its reward function are constructed based on the quality of service(QoS)attribute parameters of the mobile users,and the network attribute weights are calculated by using the analytic hierarchy process(AHP).The network handoff decision condition is designed according to the different types of user services and the time-varying characteristics of the network,and the MDP model is solved by using the genetic algorithm and simulated annealing(GA-SA),thus,users can seamlessly switch to the network with the best long-term expected reward value.Simulation results show that the proposed algorithm has good convergence performance,and can guarantee that users with different service types will obtain satisfactory expected total reward values and have low numbers of network handoffs.展开更多
文摘Traditionally Chinese people place much value in virtue, with a long-held belief that one should never appropriate valuable items lost by others. However, a recent regulation by the government of south China’s Guangdong Province
文摘The city of Suqian in east China’s Jiangsu Province,put a new regulation into practice six months ago whereby police authorities reward residents who volunteer to make peace in neighborhood quarrels,mediate in civil disputes or,for that matter,help in putting out fires.
文摘Saving people in distress can now bring a good Samaritan big bucks. The govemment of Guangzhou, Guangdong Province,announced in October that the maximum reward to people who risk their lives to save the lives and property of others,whether civilians or civil ser- vants,would be raised from 50,000 yuan ($6,667)to 300,000($40,000)yuan.The high-
文摘At the end of September, banners carrying the slogan "Catch one thief, get 1,000yuan" appeared in Gulou District of Fuzhou,capital of southeast China’s Fujian Province.According to local officials, the goal of
文摘Mobile adhoc networks have grown in prominence in recent years,and they are now utilized in a broader range of applications.The main challenges are related to routing techniques that are generally employed in them.Mobile Adhoc system management,on the other hand,requires further testing and improvements in terms of security.Traditional routing protocols,such as Adhoc On-Demand Distance Vector(AODV)and Dynamic Source Routing(DSR),employ the hop count to calculate the distance between two nodes.The main aim of this research work is to determine the optimum method for sending packets while also extending life time of the network.It is achieved by changing the residual energy of each network node.Also,in this paper,various algorithms for optimal routing based on parameters like energy,distance,mobility,and the pheromone value are proposed.Moreover,an approach based on a reward and penalty system is given in this paper to evaluate the efficiency of the proposed algorithms under the impact of parameters.The simulation results unveil that the reward penalty-based approach is quite effective for the selection of an optimal path for routing when the algorithms are implemented under the parameters of interest,which helps in achieving less packet drop and energy consumption of the nodes along with enhancing the network efficiency.
基金supported by the National Natural Science Foundation of China under Grant No.61573138.
文摘The blades of wind turbines located at high latitudes are often covered with ice in late autumn and winter,where this affects their capacity for power generation as well as their safety.Accurately identifying the icing of the blades of wind turbines in remote areas is thus important,and a general model is needed to this end.This paper proposes a universal model based on a Deep Neural Network(DNN)that uses data from the Supervisory Control and Data Acquisition(SCADA)system.Two datasets from SCADA are first preprocessed through undersampling,that is,they are labeled,normalized,and balanced.The features of icing of the blades of a turbine identified in previous studies are then used to extract training data from the training dataset.A middle feature is proposed to show how a given feature is correlated with icing on the blade.Performance indicators for the model,including a reward function,are also designed to assess its predictive accuracy.Finally,the most suitable model is used to predict the testing data,and values of the reward function and the predictive accuracy of the model are calculated.The proposed method can be used to relate continuously transferred features with a binary status of icing of the blades of the turbine by using variables of the middle feature.The results here show that an integrated indicator systemis superior to a single indicator of accuracy when evaluating the prediction model.
基金supported in part by the National Natural Science Foundation of China(62006111,62073160)the Natural Science Foundation of Jiangsu Province of China(BK20200330)。
文摘Goal-conditioned reinforcement learning(RL)is an interesting extension of the traditional RL framework,where the dynamic environment and reward sparsity can cause conventional learning algorithms to fail.Reward shaping is a practical approach to improving sample efficiency by embedding human domain knowledge into the learning process.Existing reward shaping methods for goal-conditioned RL are typically built on distance metrics with a linear and isotropic distribution,which may fail to provide sufficient information about the ever-changing environment with high complexity.This paper proposes a novel magnetic field-based reward shaping(MFRS)method for goal-conditioned RL tasks with dynamic target and obstacles.Inspired by the physical properties of magnets,we consider the target and obstacles as permanent magnets and establish the reward function according to the intensity values of the magnetic field generated by these magnets.The nonlinear and anisotropic distribution of the magnetic field intensity can provide more accessible and conducive information about the optimization landscape,thus introducing a more sophisticated magnetic reward compared to the distance-based setting.Further,we transform our magnetic reward to the form of potential-based reward shaping by learning a secondary potential function concurrently to ensure the optimal policy invariance of our method.Experiments results in both simulated and real-world robotic manipulation tasks demonstrate that MFRS outperforms relevant existing methods and effectively improves the sample efficiency of RL algorithms in goal-conditioned tasks with various dynamics of the target and obstacles.
基金supported by the Key Research and Development Program of Shaanxi(2022GY-089)the Natural Science Basic Research Program of Shaanxi(2022JQ-593).
文摘The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents can explore and summarize the environment to achieve autonomous deci-sions in the continuous state space and action space.In this paper,a cooperative defense with DDPG via swarms of unmanned aerial vehicle(UAV)is developed and validated,which has shown promising practical value in the effect of defending.We solve the sparse rewards problem of reinforcement learning pair in a long-term task by building the reward function of UAV swarms and optimizing the learning process of artificial neural network based on the DDPG algorithm to reduce the vibration in the learning process.The experimental results show that the DDPG algorithm can guide the UAVs swarm to perform the defense task efficiently,meeting the requirements of a UAV swarm for non-centralization,autonomy,and promoting the intelligent development of UAVs swarm as well as the decision-making process.
文摘As assessment outcomes provide students with a sense of accomplishment that is boosted by the reward system,learning becomes more effective.This research aims to determine the effects of reward system prior to assessment in Mathematics.Quasi-experimental research design was used to examine whether there was a significant difference between the use of reward system and students’level of performance in Mathematics.Through purposive sampling,the respondents of the study involve 80 Grade 9 students belonging to two sections from Gaudencio B.Lontok Memorial Integrated School.Based on similar demographics and pre-test results,control and study group were involved as participants of the study.Data were treated and analyzed accordingly using statistical treatments such as mean and t-test for independent variables.There was a significant finding revealing the advantage of using the reward system compare to the non-reward system in increasing students’level of performance in Mathematics.It is concluded that the use of reward system is effective in improving the assessment outcomes in Mathematics.It is recommended to use reward system for persistent assessment outcomes prior to assessment,to be a reflection of the intended outcomes in Mathematics.
文摘一、薪金1.wage工资,工钱。一般按每小时、每天或每周计算,以蓝领工人、半技术工人为对象,通常会给现金;此外,该词还用来泛指工资这一概念。使用时多用复数形式。例: (1)The postal workers have asked for wage rise of$5 a week.邮政工人要求周薪提高5美元。(2)current wage system现行工资制度2.salary多指“月薪”或“年俸”,以公职人员、公司职员、白领职工等为对象,通常通过支票付给。例: The union leaders enjoy great prestige and authorityand large salaries.工会领袖享有很高威望和很大权力,而且领取丰厚的薪水。3.stipend专指酬劳牧师、教师、行政官员的俸给。例如:
文摘It is necessary to invite the cognitive principle to improve our language teaching method and make the learning process more effectively.Compared with the traditional teacher-centered teaching method,language teaching guided by the cognitive principle can change the role of teacher and students.Teachers should pay more attention to form students' autonomy learning ability,increase their motivation and help students find balance between investing efforts and getting results.Meaningful learning should be adopted during teaching and anticipation of reward also works effectively.
文摘There is no question that learning a foreign language like English is different from learning other subjects, mainly because it is new to us Chinese and there is no enough environment. But that doesn’t mean we have no way to learn it and do it well .If asked to identify the most powerful influences on learning, motivation would probably be high on most teachers’ and learners’ lists. It seems only sensible to assume that English learning is most likely to occur when the learners want to learn. That is, when motivation such as interest, curiosity, or a desire achieves, the learners would be engaged in learning. However, how do we teachers motivate our students to like learning and learn well? Here, rewards both extrinsic and intrinsic are of great value and play a vital role in English learning.
基金funded by Henan Provincial Health Science and Technology Key Projects(201001009)National Science and Technology Infrastructure Program(2006BAI06B 08),China
文摘Objective To investigate the co-effect of Demand-control-support (DCS) model and Effort-reward Imbalance (ERI) model on the risk estimation of depression in humans in comparison with the effects when they are used respectively. Methods A total of 3 632 males and 1 706 females from 13 factories and companies in Henan province were recruited in this cross-sectional study. Perceived job stress was evaluated with the Job Content Questionnaire and Effort-Reward Imbalance Questionnaire (Chinese version). Depressive symptoms were assessed by using the Center for Epidemiological Studies Depression Scale (CES-D). Results DC (demands/job control ratio) and ERI were shown to be independently associated with depressive symptoms. The outcome of low social support and overcommitment were similar. High DC and low social support (SS), high ERI and high overcommitment, and high DC and high ERI posed greater risks of depressive symptoms than each of them did alone. ERI model and SS model seem to be effective in estimating the risk of depressive symptoms if they are used respectively. Conclusion The DC had better performance when it was used in combination with low SS. The effect on physical demands was better than on psychological demands. The combination of DCS and ERI models could improve the risk estimate of depressive symptoms in humans.
文摘The nucleus accumbens(NAc)is a subcortical brain structure known primarily for its roles in pleasure,reward,and addiction.Despite less focus on the NAc in pain research,it also plays a large role in the mediation of pain and is effective as a source of analgesia.Evidence for this involvement lies in the NAc’s cortical connections,functions,pharmacology,and therapeutic targeting.The NAc projects to and receives information from notable pain structures,such as the prefrontal cortex,anterior cingulate cortex,periaqueductal gray,habenula,thalamus,etc.Additionally,the NAc and other pain-modulating structures share functions involving opioid regulation and motivational and emotional processing,which each work beyond simply the rewarding experience of pain offset.Pharmacologically speaking,the NAc responds heavily to painful stimuli,due to its high density ofμopioid receptors and the activation of several different neurotransmitter systems in the NAc,such as opioids,dopamine,calcitonin gene-related peptide,γ-aminobutyric acid,glutamate,and substance P,each of which have been shown to elicit analgesic effects.In both preclinical and clinical models,deep brain stimulation of the NAc has elicited successful analgesia.The multi-functional NAc is important in motivational behavior,and the motivation for avoiding pain is just as important to survival as the motivation for seeking pleasure.It is possible,then,that the NAc must be involved in both pleasure and pain in order to help determine the motivational salience of positive and negative events.
基金supported by the Science and Technology Development Project of Shandong Province,China,No.2011YD18045the Natural Science Foundation of Shandong Province,China,No.ZR2012HM049+3 种基金the Health Care Foundation Program of Shandong Province,China,No.2007BZ19the Foundation Program of Technology Bureau of Qingdao,ChinaNo.Kzd-0309-1-1-33-nsh
文摘Reward-based decision-making has been found to activate several brain areas, including the ven- trolateral prefronta~ lobe, orbitofrontal cortex, anterior cingulate cortex, ventral striatum, and mesolimbic dopaminergic system. In this study, we observed brain areas activated under three de- grees of uncertainty in a reward-based decision-making task (certain, risky, and ambiguous). The tasks were presented using a brain function audiovisual stimulation system. We conducted brain scans of 15 healthy volunteers using a 3.0T magnetic resonance scanner. We used SPM8 to ana- lyze the location and intensity of activation during the reward-based decision-making task, with re- spect to the three conditions. We found that the orbitofrontal cortex was activated in the certain reward condition, while the prefrontal cortex, precentral gyrus, occipital visual cortex, inferior parietal lobe, cerebellar posterior lobe, middle temporal gyrus, inferior temporal gyrus, limbic lobe, and midbrain were activated during the 'risk' condition. The prefrontal cortex, temporal pole, inferior temporal gyrus, occipital visual cortex, and cerebellar posterior lobe were activated during am- biguous decision-making. The ventrolateral prefrontal lobe, frontal pole of the prefrontal lobe, orbi- tofrontal cortex, precentral gyrus, inferior temporal gyrus, fusiform gyrus, supramarginal gyrus, infe- rior parietal Iobule, and cerebellar posterior lobe exhibited greater activation in the 'risk' than in the 'certain' condition (P 〈 0.05). The frontal pole and dorsolateral region of the prefrontal lobe, as well as the cerebellar posterior lobe, showed significantly greater activation in the 'ambiguous' condition compared to the 'risk' condition (P 〈 0.05). The prefrontal lobe, occipital lobe, parietal lobe, temporal lobe, limbic lobe, midbrain, and posterior lobe of the cerebellum were activated during deci- sion-making about uncertain rewards. Thus, we observed different levels and regions of activation for different types of reward processing during decision-making. Specifically, when the degree of reward uncertainty increased, the number of activated brain areas increased, including greater ac- tivation of brain areas associated with loss.
基金partially supported by Nation Science Foundation of China (61661025, 61661026)Foundation of A hundred Youth Talents Training Program of Lanzhou Jiaotong University (152022)
文摘A network selection optimization algorithm based on the Markov decision process(MDP)is proposed so that mobile terminals can always connect to the best wireless network in a heterogeneous network environment.Considering the different types of service requirements,the MDP model and its reward function are constructed based on the quality of service(QoS)attribute parameters of the mobile users,and the network attribute weights are calculated by using the analytic hierarchy process(AHP).The network handoff decision condition is designed according to the different types of user services and the time-varying characteristics of the network,and the MDP model is solved by using the genetic algorithm and simulated annealing(GA-SA),thus,users can seamlessly switch to the network with the best long-term expected reward value.Simulation results show that the proposed algorithm has good convergence performance,and can guarantee that users with different service types will obtain satisfactory expected total reward values and have low numbers of network handoffs.