We consider the problem of displaying commer- cial advertisements on web pages, in the "cost per click" model. The advertisement server has to learn the appeal of each type of visitor for the different advertisement...We consider the problem of displaying commer- cial advertisements on web pages, in the "cost per click" model. The advertisement server has to learn the appeal of each type of visitor for the different advertisements in order to maximize the profit. Advertisements have constraints such as a certain number of clicks to draw, as well as a lifetime. This problem is thus inherently dynamic, and intimately com- bines combinatorial and statistical issues. To set the stage, it is also noteworthy that we deal with very rare events of in- terest, since the base probability of one click is in the or- der of 10-4. Different approaches may be thought of, rang- ing from computationally demanding ones (use of Markov decision processes, or stochastic programming) to very fast ones. We introduce NOSEED, an adaptive policy learning al- gorithm based on a combination of linear programming and multi-arm bandits. We also propose a way to evaluate the extent to which we have to handle the constraints (which is directly related to the computation cost). We investigate the performance of our system through simulations on a realistic model designed with an important commercial web actor.展开更多
文摘We consider the problem of displaying commer- cial advertisements on web pages, in the "cost per click" model. The advertisement server has to learn the appeal of each type of visitor for the different advertisements in order to maximize the profit. Advertisements have constraints such as a certain number of clicks to draw, as well as a lifetime. This problem is thus inherently dynamic, and intimately com- bines combinatorial and statistical issues. To set the stage, it is also noteworthy that we deal with very rare events of in- terest, since the base probability of one click is in the or- der of 10-4. Different approaches may be thought of, rang- ing from computationally demanding ones (use of Markov decision processes, or stochastic programming) to very fast ones. We introduce NOSEED, an adaptive policy learning al- gorithm based on a combination of linear programming and multi-arm bandits. We also propose a way to evaluate the extent to which we have to handle the constraints (which is directly related to the computation cost). We investigate the performance of our system through simulations on a realistic model designed with an important commercial web actor.