The cloud platform has limited defense resources to fully protect the edge servers used to process crowd sensing data in Internet of Things.To guarantee the network's overall security,we present a network defense ...The cloud platform has limited defense resources to fully protect the edge servers used to process crowd sensing data in Internet of Things.To guarantee the network's overall security,we present a network defense resource allocation with multi-armed bandits to maximize the network's overall benefit.Firstly,we propose the method for dynamic setting of node defense resource thresholds to obtain the defender(attacker)benefit function of edge servers(nodes)and distribution.Secondly,we design a defense resource sharing mechanism for neighboring nodes to obtain the defense capability of nodes.Subsequently,we use the decomposability and Lipschitz conti-nuity of the defender's total expected utility to reduce the difference between the utility's discrete and continuous arms and analyze the difference theoretically.Finally,experimental results show that the method maximizes the defender's total expected utility and reduces the difference between the discrete and continuous arms of the utility.展开更多
As a combination of edge computing and artificial intelligence,edge intelligence has become a promising technique and provided its users with a series of fast,precise,and customized services.In edge intelligence,when ...As a combination of edge computing and artificial intelligence,edge intelligence has become a promising technique and provided its users with a series of fast,precise,and customized services.In edge intelligence,when learning agents are deployed on the edge side,the data aggregation from the end side to the designated edge devices is an important research topic.Considering the various importance of end devices,this paper studies the weighted data aggregation problem in a single hop end-to-edge communication network.Firstly,to make sure all the end devices with various weights are fairly treated in data aggregation,a distributed end-to-edge cooperative scheme is proposed.Then,to handle the massive contention on the wireless channel caused by end devices,a multi-armed bandit(MAB)algorithm is designed to help the end devices find their most appropriate update rates.Diffe-rent from the traditional data aggregation works,combining the MAB enables our algorithm a higher efficiency in data aggregation.With a theoretical analysis,we show that the efficiency of our algorithm is asymptotically optimal.Comparative experiments with previous works are also conducted to show the strength of our algorithm.展开更多
In order to solve the high latency of traditional cloud computing and the processing capacity limitation of Internet of Things(IoT)users,Multi-access Edge Computing(MEC)migrates computing and storage capabilities from...In order to solve the high latency of traditional cloud computing and the processing capacity limitation of Internet of Things(IoT)users,Multi-access Edge Computing(MEC)migrates computing and storage capabilities from the remote data center to the edge of network,providing users with computation services quickly and directly.In this paper,we investigate the impact of the randomness caused by the movement of the IoT user on decision-making for offloading,where the connection between the IoT user and the MEC servers is uncertain.This uncertainty would be the main obstacle to assign the task accurately.Consequently,if the assigned task cannot match well with the real connection time,a migration(connection time is not enough to process)would be caused.In order to address the impact of this uncertainty,we formulate the offloading decision as an optimization problem considering the transmission,computation and migration.With the help of Stochastic Programming(SP),we use the posteriori recourse to compensate for inaccurate predictions.Meanwhile,in heterogeneous networks,considering multiple candidate MEC servers could be selected simultaneously due to overlapping,we also introduce the Multi-Arm Bandit(MAB)theory for MEC selection.The extensive simulations validate the improvement and effectiveness of the proposed SP-based Multi-arm bandit Method(SMM)for offloading in terms of reward,cost,energy consumption and delay.The results showthat SMMcan achieve about 20%improvement compared with the traditional offloading method that does not consider the randomness,and it also outperforms the existing SP/MAB based method for offloading.展开更多
The process of making decisions is something humans do inherently and routinely,to the extent that it appears commonplace. However,in order to achieve good overall performance,decisions must take into account both the...The process of making decisions is something humans do inherently and routinely,to the extent that it appears commonplace. However,in order to achieve good overall performance,decisions must take into account both the outcomes of past decisions and opportunities of future ones. Reinforcement learning,which is fundamental to sequential decision-making,consists of the following components: 1 A set of decisions epochs; 2 A set of environment states; 3 A set of available actions to transition states; 4 State-action dependent immediate rewards for each action.At each decision,the environment state provides the decision maker with a set of available actions from which to choose. As a result of selecting a particular action in the state,the environment generates an immediate reward for the decision maker and shifts to a different state and decision. The ultimate goal for the decision maker is to maximize the total reward after a sequence of time steps.This paper will focus on an archetypal example of reinforcement learning,the stochastic multi-armed bandit problem. After introducing the dilemma,I will briefly cover the most common methods used to solve it,namely the UCB and εn- greedy algorithms. I will also introduce my own greedy implementation,the strict-greedy algorithm,which more tightly follows the greedy pattern in algorithm design,and show that it runs comparably to the two accepted algorithms.展开更多
Artificial intelligence has permeated all aspects of our lives today. However, to make AI behave like real AI, the critical bottleneck lies in the speed of computing. Quantum computers employ the peculiar and unique p...Artificial intelligence has permeated all aspects of our lives today. However, to make AI behave like real AI, the critical bottleneck lies in the speed of computing. Quantum computers employ the peculiar and unique properties of quantum states such as superposition, entanglement, and interference to process information in ways that classical computers cannot. As a new paradigm of computation, quantum computers are capable of performing tasks intractable for classical processors, thus providing a quantum leap in AI research and making the development of real AI a possibility. In this regard, quantum machine learning not only enhances the classical machine learning approach but more importantly it provides an avenue to explore new machine learning models that have no classical counterparts. The qubit-based quantum computers cannot naturally represent the continuous variables commonly used in machine learning, since the measurement outputs of qubit-based circuits are generally discrete. Therefore, a continuous-variable (CV) quantum architecture based on a photonic quantum computing model is selected for our study. In this work, we employ machine learning and optimization to create photonic quantum circuits that can solve the contextual multi-armed bandit problem, a problem in the domain of reinforcement learning, which demonstrates that quantum reinforcement learning algorithms can be learned by a quantum device.展开更多
In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion.At each round,contexts are revealed for each arm,and the decision maker chooses one arm to pull and ...In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion.At each round,contexts are revealed for each arm,and the decision maker chooses one arm to pull and receives the corresponding reward.In particular,we consider mean-variance as the risk criterion,and the best arm is the one with the largest mean-variance reward.We apply the Thompson sampling algorithm for the disjoint model,and provide a comprehensive regret analysis for a variant of the proposed algorithm.For T rounds,K actions,and d-dimensional feature vectors,we prove a regret bound of O((1+ρ+1/ρ)d In T ln K/δ√dKT^(1+2∈)ln K/δ1/e)that holds with probability 1-δunder the mean-variance criterion with risk tolerance p,for any 0<ε<1/2,0<δ<1.The empirical performance of our proposed algorithms is demonstrated via a portfolio selection problem.展开更多
In this paper,we investigate the minimization of age of information(AoI),a metric that measures the information freshness,at the network edge with unreliable wireless communications.Particularly,we consider a set of u...In this paper,we investigate the minimization of age of information(AoI),a metric that measures the information freshness,at the network edge with unreliable wireless communications.Particularly,we consider a set of users transmitting status updates,which are collected by the user randomly over time,to an edge server through unreliable orthogonal channels.It begs a natural question:with random status update arrivals and obscure channel conditions,can we devise an intelligent scheduling policy that matches the users and channels to stabilize the queues of all users while minimizing the average AoI?To give an adequate answer,we define a bipartite graph and formulate a dynamic edge activation problem with stability constraints.Then,we propose an online matching while learning algorithm(MatL)and discuss its implementation for wireless scheduling.Finally,simulation results demonstrate that the MatL is reliable to learn the channel states and manage the users’buffers for fresher information at the edge.展开更多
As the penetration of renewable energy continues to increase,stochastic and intermittent generation resources gradually replace the conventional generators,bringing significant challenges in stabilizing power system f...As the penetration of renewable energy continues to increase,stochastic and intermittent generation resources gradually replace the conventional generators,bringing significant challenges in stabilizing power system frequency.Thus,aggregating demand-side resources for frequency regulation attracts attentions from both academia and industry.However,in practice,conventional aggregation approaches suffer from random and uncertain behaviors of the users such as opting out control signals.The risk-averse multi-armed bandit learning approach is adopted to learn the behaviors of the users and a novel aggregation strategy is developed for residential heating,ventilation,and air conditioning(HVAC)to provide reliable secondary frequency regulation.Compared with the conventional approach,the simulation results show that the risk-averse multiarmed bandit learning approach performs better in secondary frequency regulation with fewer users being selected and opting out of the control.Besides,the proposed approach is more robust to random and changing behaviors of the users.展开更多
With the development of maritime informatization and the increased generation of marine data,the demands of efficient and reliable maritime communication surge.However,harsh and dynamic marine communication environmen...With the development of maritime informatization and the increased generation of marine data,the demands of efficient and reliable maritime communication surge.However,harsh and dynamic marine communication environmentcan distort transmission signal,which significantly weaken the communication performance.Therefore,for maritime wireless communication system,the channel estimation is often required to detect the channel suffered from the impacts of changing factors.Since there is no universal maritime communication channel model and channel varies dynamically,channel estimation method needs to make decision dynamically without pre-knowledge of channel distribution.This paper studies the radio channel estimation problem of wireless communications over the sea surface.To improve the estimation accuracy,this paper utilizes multi-armed bandit(MAB)problem to deal with the uncertainty of channel state information(CSI),then proposes a dynamic channel estimation algorithm to explore the global changing channel information,and asymptotically minimize the estimation error.By the aid of MAB,the estimation is not only dynamic according to channel variation,but also does not need to know the channel distribution.Simulation results show that the proposed algorithm can achieve higher estimation accuracy compared to matching pursuit(MP)-based and fractional Fourier transform(FrFT)-based methods.展开更多
基金supported by the National Natural Science Foundation of China(NSFC)[grant numbers 62172377,61872205]the Shandong Provincial Natural Science Foundation[grant number ZR2019MF018]the Startup Research Foundation for Distinguished Scholars No.202112016.
文摘The cloud platform has limited defense resources to fully protect the edge servers used to process crowd sensing data in Internet of Things.To guarantee the network's overall security,we present a network defense resource allocation with multi-armed bandits to maximize the network's overall benefit.Firstly,we propose the method for dynamic setting of node defense resource thresholds to obtain the defender(attacker)benefit function of edge servers(nodes)and distribution.Secondly,we design a defense resource sharing mechanism for neighboring nodes to obtain the defense capability of nodes.Subsequently,we use the decomposability and Lipschitz conti-nuity of the defender's total expected utility to reduce the difference between the utility's discrete and continuous arms and analyze the difference theoretically.Finally,experimental results show that the method maximizes the defender's total expected utility and reduces the difference between the discrete and continuous arms of the utility.
基金supported by the National Natural Science Foundation of China(NSFC)(62102232,62122042,61971269)Natural Science Foundation of Shandong Province Under(ZR2021QF064)。
文摘As a combination of edge computing and artificial intelligence,edge intelligence has become a promising technique and provided its users with a series of fast,precise,and customized services.In edge intelligence,when learning agents are deployed on the edge side,the data aggregation from the end side to the designated edge devices is an important research topic.Considering the various importance of end devices,this paper studies the weighted data aggregation problem in a single hop end-to-edge communication network.Firstly,to make sure all the end devices with various weights are fairly treated in data aggregation,a distributed end-to-edge cooperative scheme is proposed.Then,to handle the massive contention on the wireless channel caused by end devices,a multi-armed bandit(MAB)algorithm is designed to help the end devices find their most appropriate update rates.Diffe-rent from the traditional data aggregation works,combining the MAB enables our algorithm a higher efficiency in data aggregation.With a theoretical analysis,we show that the efficiency of our algorithm is asymptotically optimal.Comparative experiments with previous works are also conducted to show the strength of our algorithm.
基金This work was supported in part by the Zhejiang Lab under Grant 20210AB02in part by the Sichuan International Science and Technology Innovation Cooperation/Hong Kong,Macao and Taiwan Science and Technology Innovation Cooperation Project under Grant 2019YFH0163in part by the Key Research and Development Project of Sichuan Provincial Department of Science and Technology under Grant 2018JZ0071.
文摘In order to solve the high latency of traditional cloud computing and the processing capacity limitation of Internet of Things(IoT)users,Multi-access Edge Computing(MEC)migrates computing and storage capabilities from the remote data center to the edge of network,providing users with computation services quickly and directly.In this paper,we investigate the impact of the randomness caused by the movement of the IoT user on decision-making for offloading,where the connection between the IoT user and the MEC servers is uncertain.This uncertainty would be the main obstacle to assign the task accurately.Consequently,if the assigned task cannot match well with the real connection time,a migration(connection time is not enough to process)would be caused.In order to address the impact of this uncertainty,we formulate the offloading decision as an optimization problem considering the transmission,computation and migration.With the help of Stochastic Programming(SP),we use the posteriori recourse to compensate for inaccurate predictions.Meanwhile,in heterogeneous networks,considering multiple candidate MEC servers could be selected simultaneously due to overlapping,we also introduce the Multi-Arm Bandit(MAB)theory for MEC selection.The extensive simulations validate the improvement and effectiveness of the proposed SP-based Multi-arm bandit Method(SMM)for offloading in terms of reward,cost,energy consumption and delay.The results showthat SMMcan achieve about 20%improvement compared with the traditional offloading method that does not consider the randomness,and it also outperforms the existing SP/MAB based method for offloading.
文摘The process of making decisions is something humans do inherently and routinely,to the extent that it appears commonplace. However,in order to achieve good overall performance,decisions must take into account both the outcomes of past decisions and opportunities of future ones. Reinforcement learning,which is fundamental to sequential decision-making,consists of the following components: 1 A set of decisions epochs; 2 A set of environment states; 3 A set of available actions to transition states; 4 State-action dependent immediate rewards for each action.At each decision,the environment state provides the decision maker with a set of available actions from which to choose. As a result of selecting a particular action in the state,the environment generates an immediate reward for the decision maker and shifts to a different state and decision. The ultimate goal for the decision maker is to maximize the total reward after a sequence of time steps.This paper will focus on an archetypal example of reinforcement learning,the stochastic multi-armed bandit problem. After introducing the dilemma,I will briefly cover the most common methods used to solve it,namely the UCB and εn- greedy algorithms. I will also introduce my own greedy implementation,the strict-greedy algorithm,which more tightly follows the greedy pattern in algorithm design,and show that it runs comparably to the two accepted algorithms.
文摘Artificial intelligence has permeated all aspects of our lives today. However, to make AI behave like real AI, the critical bottleneck lies in the speed of computing. Quantum computers employ the peculiar and unique properties of quantum states such as superposition, entanglement, and interference to process information in ways that classical computers cannot. As a new paradigm of computation, quantum computers are capable of performing tasks intractable for classical processors, thus providing a quantum leap in AI research and making the development of real AI a possibility. In this regard, quantum machine learning not only enhances the classical machine learning approach but more importantly it provides an avenue to explore new machine learning models that have no classical counterparts. The qubit-based quantum computers cannot naturally represent the continuous variables commonly used in machine learning, since the measurement outputs of qubit-based circuits are generally discrete. Therefore, a continuous-variable (CV) quantum architecture based on a photonic quantum computing model is selected for our study. In this work, we employ machine learning and optimization to create photonic quantum circuits that can solve the contextual multi-armed bandit problem, a problem in the domain of reinforcement learning, which demonstrates that quantum reinforcement learning algorithms can be learned by a quantum device.
基金support by the Air Force Office of Scientific Research under Grant FA9550-19-1-0283 and Grant FA9550-22-1-0244National Science Foundation under Grant DMS2053489.
文摘In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion.At each round,contexts are revealed for each arm,and the decision maker chooses one arm to pull and receives the corresponding reward.In particular,we consider mean-variance as the risk criterion,and the best arm is the one with the largest mean-variance reward.We apply the Thompson sampling algorithm for the disjoint model,and provide a comprehensive regret analysis for a variant of the proposed algorithm.For T rounds,K actions,and d-dimensional feature vectors,we prove a regret bound of O((1+ρ+1/ρ)d In T ln K/δ√dKT^(1+2∈)ln K/δ1/e)that holds with probability 1-δunder the mean-variance criterion with risk tolerance p,for any 0<ε<1/2,0<δ<1.The empirical performance of our proposed algorithms is demonstrated via a portfolio selection problem.
基金supported in part by Shanghai Pujiang Program under Grant No.21PJ1402600in part by Natural Science Foundation of Chongqing,China under Grant No.CSTB2022NSCQ-MSX0375+4 种基金in part by Song Shan Laboratory Foundation,under Grant No.YYJC022022007in part by Zhejiang Provincial Natural Science Foundation of China under Grant LGJ22F010001in part by National Key Research and Development Program of China under Grant 2020YFA0711301in part by National Natural Science Foundation of China under Grant 61922049。
文摘In this paper,we investigate the minimization of age of information(AoI),a metric that measures the information freshness,at the network edge with unreliable wireless communications.Particularly,we consider a set of users transmitting status updates,which are collected by the user randomly over time,to an edge server through unreliable orthogonal channels.It begs a natural question:with random status update arrivals and obscure channel conditions,can we devise an intelligent scheduling policy that matches the users and channels to stabilize the queues of all users while minimizing the average AoI?To give an adequate answer,we define a bipartite graph and formulate a dynamic edge activation problem with stability constraints.Then,we propose an online matching while learning algorithm(MatL)and discuss its implementation for wireless scheduling.Finally,simulation results demonstrate that the MatL is reliable to learn the channel states and manage the users’buffers for fresher information at the edge.
基金supported by the National Natural Science Foundation of China(No.51907026)Natural Science Foundation of Jiangsu(No.BK20190361)+1 种基金Jiangsu Provincial Key Laboratory of Smart Grid Technology and EquipmentGlobal Energy Interconnection Research Institute(No.SGGR0000WLJS1900107)
文摘As the penetration of renewable energy continues to increase,stochastic and intermittent generation resources gradually replace the conventional generators,bringing significant challenges in stabilizing power system frequency.Thus,aggregating demand-side resources for frequency regulation attracts attentions from both academia and industry.However,in practice,conventional aggregation approaches suffer from random and uncertain behaviors of the users such as opting out control signals.The risk-averse multi-armed bandit learning approach is adopted to learn the behaviors of the users and a novel aggregation strategy is developed for residential heating,ventilation,and air conditioning(HVAC)to provide reliable secondary frequency regulation.Compared with the conventional approach,the simulation results show that the risk-averse multiarmed bandit learning approach performs better in secondary frequency regulation with fewer users being selected and opting out of the control.Besides,the proposed approach is more robust to random and changing behaviors of the users.
基金supported by the Natural Science Foundation Project of Shanghai(20ZR1423200)the Innovation Program of Shanghai Municipal Education Commission(2021-01-07-00-10-E00121)。
文摘With the development of maritime informatization and the increased generation of marine data,the demands of efficient and reliable maritime communication surge.However,harsh and dynamic marine communication environmentcan distort transmission signal,which significantly weaken the communication performance.Therefore,for maritime wireless communication system,the channel estimation is often required to detect the channel suffered from the impacts of changing factors.Since there is no universal maritime communication channel model and channel varies dynamically,channel estimation method needs to make decision dynamically without pre-knowledge of channel distribution.This paper studies the radio channel estimation problem of wireless communications over the sea surface.To improve the estimation accuracy,this paper utilizes multi-armed bandit(MAB)problem to deal with the uncertainty of channel state information(CSI),then proposes a dynamic channel estimation algorithm to explore the global changing channel information,and asymptotically minimize the estimation error.By the aid of MAB,the estimation is not only dynamic according to channel variation,but also does not need to know the channel distribution.Simulation results show that the proposed algorithm can achieve higher estimation accuracy compared to matching pursuit(MP)-based and fractional Fourier transform(FrFT)-based methods.