期刊文献+
共找到6篇文章
< 1 >
每页显示 20 50 100
Stepwise Method Based on Confidence Bound and Information Incorporation for Identifying the Maximum Tolerable Dose
1
作者 王雪丽 陶剑 史宁中 《Northeastern Mathematical Journal》 CSCD 2005年第1期117-126,共10页
The primary goal of a phase I clinical trial is to find the maximum tolerable dose of a treatment. In this paper, we propose a new stepwise method based on confidence bound and information incorporation to determine t... The primary goal of a phase I clinical trial is to find the maximum tolerable dose of a treatment. In this paper, we propose a new stepwise method based on confidence bound and information incorporation to determine the maximum tolerable dose among given dose levels. On the one hand, in order to avoid severe even fatal toxicity to occur and reduce the experimental subjects, the new method is executed from the lowest dose level, and then goes on in a stepwise fashion. On the other hand, in order to improve the accuracy of the recommendation, the final recommendation of the maximum tolerable dose is accomplished through the information incorporation of an additional experimental cohort at the same dose level. Furthermore, empirical simulation results show that the new method has some real advantages in comparison with the modified continual reassessment method. 展开更多
关键词 confidence bound continual reassessment method information incorporation maximum tolerable dose phase I clinical trials stepwise method
下载PDF
Q-greedyUCB: a New Exploration Policy to Learn Resource-Efficient Scheduling
2
作者 Yu Zhao Joohyun Lee Wei Chen 《China Communications》 SCIE CSCD 2021年第6期12-23,共12页
This paper proposes a Reinforcement learning(RL)algorithm to find an optimal scheduling policy to minimize the delay for a given energy constraint in communication system where the environments such as traffic arrival... This paper proposes a Reinforcement learning(RL)algorithm to find an optimal scheduling policy to minimize the delay for a given energy constraint in communication system where the environments such as traffic arrival rates are not known in advance and can change over time.For this purpose,this problem is formulated as an infinite-horizon Constrained Markov Decision Process(CMDP).To handle the constrained optimization problem,we first adopt the Lagrangian relaxation technique to solve it.Then,we propose a variant of Q-learning,Q-greedyUCB that combinesε-greedy and Upper Confidence Bound(UCB)algorithms to solve this constrained MDP problem.We mathematically prove that the Q-greedyUCB algorithm converges to an optimal solution.Simulation results also show that Q-greedyUCB finds an optimal scheduling strategy,and is more efficient than Q-learning withε-greedy,R-learning and the Averagepayoff RL(ARL)algorithm in terms of the cumulative regret.We also show that our algorithm can learn and adapt to the changes of the environment,so as to obtain an optimal scheduling strategy under a given power constraint for the new environment. 展开更多
关键词 reinforcement learning for average rewards infinite-horizon Markov decision process upper confidence bound queue scheduling
下载PDF
Uniformly Most Powerful Invariant Test and Its Application
3
作者 张双林 沙秋英 周文海 《Northeastern Mathematical Journal》 CSCD 2001年第1期13-20,共8页
The authors consider the uniformly most powerful invariant test of the testing problems (Ⅰ) H 0: μ′Σ -1 μ≥CH 1: μ′Σ -1 μ<C and (Ⅱ) H 00 : β′X′Xβσ 2≥CH 11 : β′X′Xβσ 2<C u... The authors consider the uniformly most powerful invariant test of the testing problems (Ⅰ) H 0: μ′Σ -1 μ≥CH 1: μ′Σ -1 μ<C and (Ⅱ) H 00 : β′X′Xβσ 2≥CH 11 : β′X′Xβσ 2<C under m dimensional normal population N m(μ, Σ) and normal linear model (Y, Xβ, σ 2) respectively. Furthermore, an application of the uniformly most powerful invariant test is given. 展开更多
关键词 invariant test uniformly most powerful test improved estimator uniformly most accurate confidence bound
下载PDF
CONSENSUS FORMATION OF TWO-LEVEL OPINION DYNAMICS 被引量:1
4
作者 Yilun SHANG 《Acta Mathematica Scientia》 SCIE CSCD 2014年第4期1029-1040,共12页
Opinion dynamics have received significant attention in recent years. This paper proposes a bounded confidence opinion model for a group of agents with two different confidence levels. Each agent in the population is ... Opinion dynamics have received significant attention in recent years. This paper proposes a bounded confidence opinion model for a group of agents with two different confidence levels. Each agent in the population is endowed with a confidence interval around her opinion wiih radius αd or (1 - α)d, where α∈ (0, 1/2] represents the differentiation of confidence levels. We analytically derived the critical confidence bound dc = 1/(4α) for the two-level opinion dynamics on Z. A single opinion cluster is formed with probability 1 above this critical value regardless of the ratio p of agents with high/low confidence. Extensive numerical simulations are performed to illustrate our theoretical results. Noticed is a clear impact of p on the collective behavior: more agents with high confidence lead to harder agreement. It is also experimentally revealed that the sharpness of the threshold dc increases with a but does not depend on p. 展开更多
关键词 social dynamics bounded confidence phase transition Monte Carlo simulation
下载PDF
Bandit neural architecture search based on performance evaluation for operation selection
5
作者 ZHANG Jian GONG Xuan +3 位作者 LIU YuXiao WANG Wei WANG Lei ZHANG BaoChang 《Science China(Technological Sciences)》 SCIE EI CAS CSCD 2023年第2期481-488,共8页
Neural architecture search(NAS)plays an important role in many computer vision tasks.However,the high computational cost of forward and backward propagations during the search,process restricts its practical applicati... Neural architecture search(NAS)plays an important role in many computer vision tasks.However,the high computational cost of forward and backward propagations during the search,process restricts its practical application.In this paper,we present the search process as a multi-armed bandit problem,where we take into account the uncertainty caused by the contradiction between the huge search space and limited number of trials.Bandit NAS optimizes the trade-off between exploitation and exploration for a highly efficient search.Specifically,we sampled from a set of operations in one trial,where each operation was weighted by its trial performance and a bias to allow operations with less training to be selected.We further reduced the search space by abandoning the operation with the lowest potential,significantly reducing the search cost.Experimental results on the CIFAR-10 dataset show that the resulting architecture achieves the most advanced precision with a search speed approximately two times faster than that of partially connected differentialble architecture search.On ImageNet,we attained the most advanced top-1 accuracy of 75.3%with a search time of 1.8 GPU days. 展开更多
关键词 bandit NAS DARTS upper confidence bounds
原文传递
LiFE:Deep Exploration via Linear-Feature Bonus in Continuous Control
6
作者 Jiantao Qiu Yu Wang 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2023年第1期155-166,共12页
Reinforcement Learning(RL)algorithms work well with well-defined rewards,but they fail with sparse/deceptive rewards and require additional exploration strategies.This work introduces a deep exploration method based o... Reinforcement Learning(RL)algorithms work well with well-defined rewards,but they fail with sparse/deceptive rewards and require additional exploration strategies.This work introduces a deep exploration method based on the Upper Confidence Bound(UCB)bonus.The proposed method can be plugged into actor-critic algorithms that use deep neural networks as a critic.Based on the conclusion of the regret bound under the linear Markov decision process approximation,we use the feature matrix to calculate the UCB bonus for deep exploration.The proposed method is equivalent to the count-based exploration method in special cases and is suitable for general situations.Our method uses the last d-dimensional feature vector in the critic network and is easy to deploy.We design a simple task,“swim”,to demonstrate the principle of the proposed method to achieve exploration in sparse/deceptive reward environments.Then,we perform an empirical evaluation on sparse/deceptive reward version gym environments and Ackermann robot control tasks.The evaluation results verify that the proposed algorithm can perform effective deep explorations in sparse/deceptive reward tasks. 展开更多
关键词 Reinforcement Learning(RL) Neural Network(NN) Upper confidence bound(UCB)
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部