期刊文献+
共找到18篇文章
< 1 >
每页显示 20 50 100
Training a Quantum Neural Network to Solve the Contextual Multi-Armed Bandit Problem
1
作者 Wei Hu James Hu 《Natural Science》 2019年第1期17-27,共11页
Artificial intelligence has permeated all aspects of our lives today. However, to make AI behave like real AI, the critical bottleneck lies in the speed of computing. Quantum computers employ the peculiar and unique p... Artificial intelligence has permeated all aspects of our lives today. However, to make AI behave like real AI, the critical bottleneck lies in the speed of computing. Quantum computers employ the peculiar and unique properties of quantum states such as superposition, entanglement, and interference to process information in ways that classical computers cannot. As a new paradigm of computation, quantum computers are capable of performing tasks intractable for classical processors, thus providing a quantum leap in AI research and making the development of real AI a possibility. In this regard, quantum machine learning not only enhances the classical machine learning approach but more importantly it provides an avenue to explore new machine learning models that have no classical counterparts. The qubit-based quantum computers cannot naturally represent the continuous variables commonly used in machine learning, since the measurement outputs of qubit-based circuits are generally discrete. Therefore, a continuous-variable (CV) quantum architecture based on a photonic quantum computing model is selected for our study. In this work, we employ machine learning and optimization to create photonic quantum circuits that can solve the contextual multi-armed bandit problem, a problem in the domain of reinforcement learning, which demonstrates that quantum reinforcement learning algorithms can be learned by a quantum device. 展开更多
关键词 Continuous-Variable QUANTUM COMPUTERS QUANTUM Machine LEARNING QUANTUM Reinforcement LEARNING contextual multi-armed bandit PROBLEM
下载PDF
Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs
2
作者 Yifan Lin Yuhao Wang Enlu Zhou 《Journal of Systems Science and Systems Engineering》 SCIE EI CSCD 2023年第3期267-288,共22页
In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion.At each round,contexts are revealed for each arm,and the decision maker chooses one arm to pull and ... In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion.At each round,contexts are revealed for each arm,and the decision maker chooses one arm to pull and receives the corresponding reward.In particular,we consider mean-variance as the risk criterion,and the best arm is the one with the largest mean-variance reward.We apply the Thompson sampling algorithm for the disjoint model,and provide a comprehensive regret analysis for a variant of the proposed algorithm.For T rounds,K actions,and d-dimensional feature vectors,we prove a regret bound of O((1+ρ+1/ρ)d In T ln K/δ√dKT^(1+2∈)ln K/δ1/e)that holds with probability 1-δunder the mean-variance criterion with risk tolerance p,for any 0<ε<1/2,0<δ<1.The empirical performance of our proposed algorithms is demonstrated via a portfolio selection problem. 展开更多
关键词 multi-armed bandit CONTEXT RISK-AVERSE Thompson sampling
原文传递
Distributed Weighted Data Aggregation Algorithm in End-to-Edge Communication Networks Based on Multi-armed Bandit 被引量:1
3
作者 Yifei ZOU Senmao QI +1 位作者 Cong'an XU Dongxiao YU 《计算机科学》 CSCD 北大核心 2023年第2期13-22,共10页
As a combination of edge computing and artificial intelligence,edge intelligence has become a promising technique and provided its users with a series of fast,precise,and customized services.In edge intelligence,when ... As a combination of edge computing and artificial intelligence,edge intelligence has become a promising technique and provided its users with a series of fast,precise,and customized services.In edge intelligence,when learning agents are deployed on the edge side,the data aggregation from the end side to the designated edge devices is an important research topic.Considering the various importance of end devices,this paper studies the weighted data aggregation problem in a single hop end-to-edge communication network.Firstly,to make sure all the end devices with various weights are fairly treated in data aggregation,a distributed end-to-edge cooperative scheme is proposed.Then,to handle the massive contention on the wireless channel caused by end devices,a multi-armed bandit(MAB)algorithm is designed to help the end devices find their most appropriate update rates.Diffe-rent from the traditional data aggregation works,combining the MAB enables our algorithm a higher efficiency in data aggregation.With a theoretical analysis,we show that the efficiency of our algorithm is asymptotically optimal.Comparative experiments with previous works are also conducted to show the strength of our algorithm. 展开更多
关键词 Weighted data aggregation End-to-edge communication multi-armed bandit Edge intelligence
下载PDF
Stochastic programming based multi-arm bandit offloading strategy for internet of things
4
作者 Bin Cao Tingyong Wu Xiang Bai 《Digital Communications and Networks》 SCIE CSCD 2023年第5期1200-1211,共12页
In order to solve the high latency of traditional cloud computing and the processing capacity limitation of Internet of Things(IoT)users,Multi-access Edge Computing(MEC)migrates computing and storage capabilities from... In order to solve the high latency of traditional cloud computing and the processing capacity limitation of Internet of Things(IoT)users,Multi-access Edge Computing(MEC)migrates computing and storage capabilities from the remote data center to the edge of network,providing users with computation services quickly and directly.In this paper,we investigate the impact of the randomness caused by the movement of the IoT user on decision-making for offloading,where the connection between the IoT user and the MEC servers is uncertain.This uncertainty would be the main obstacle to assign the task accurately.Consequently,if the assigned task cannot match well with the real connection time,a migration(connection time is not enough to process)would be caused.In order to address the impact of this uncertainty,we formulate the offloading decision as an optimization problem considering the transmission,computation and migration.With the help of Stochastic Programming(SP),we use the posteriori recourse to compensate for inaccurate predictions.Meanwhile,in heterogeneous networks,considering multiple candidate MEC servers could be selected simultaneously due to overlapping,we also introduce the Multi-Arm Bandit(MAB)theory for MEC selection.The extensive simulations validate the improvement and effectiveness of the proposed SP-based Multi-arm bandit Method(SMM)for offloading in terms of reward,cost,energy consumption and delay.The results showthat SMMcan achieve about 20%improvement compared with the traditional offloading method that does not consider the randomness,and it also outperforms the existing SP/MAB based method for offloading. 展开更多
关键词 Multi-access computing Internet of things OFFLOADING Stochastic programming multi-arm bandit
下载PDF
Strict greedy design paradigm applied to the stochastic multi-armed bandit problem
5
作者 Joey Hong 《机床与液压》 北大核心 2015年第6期1-6,共6页
The process of making decisions is something humans do inherently and routinely,to the extent that it appears commonplace. However,in order to achieve good overall performance,decisions must take into account both the... The process of making decisions is something humans do inherently and routinely,to the extent that it appears commonplace. However,in order to achieve good overall performance,decisions must take into account both the outcomes of past decisions and opportunities of future ones. Reinforcement learning,which is fundamental to sequential decision-making,consists of the following components: 1 A set of decisions epochs; 2 A set of environment states; 3 A set of available actions to transition states; 4 State-action dependent immediate rewards for each action.At each decision,the environment state provides the decision maker with a set of available actions from which to choose. As a result of selecting a particular action in the state,the environment generates an immediate reward for the decision maker and shifts to a different state and decision. The ultimate goal for the decision maker is to maximize the total reward after a sequence of time steps.This paper will focus on an archetypal example of reinforcement learning,the stochastic multi-armed bandit problem. After introducing the dilemma,I will briefly cover the most common methods used to solve it,namely the UCB and εn- greedy algorithms. I will also introduce my own greedy implementation,the strict-greedy algorithm,which more tightly follows the greedy pattern in algorithm design,and show that it runs comparably to the two accepted algorithms. 展开更多
关键词 Greedy algorithms Allocation strategy Stochastic multi-armed bandit problem
下载PDF
Starlet:Network defense resource allocation with multi-armed bandits for cloud-edge crowd sensing in IoT
6
作者 Hui Xia Ning Huang +2 位作者 Xuecai Feng Rui Zhang Chao Liu 《Digital Communications and Networks》 SCIE 2024年第3期586-596,共11页
The cloud platform has limited defense resources to fully protect the edge servers used to process crowd sensing data in Internet of Things.To guarantee the network's overall security,we present a network defense ... The cloud platform has limited defense resources to fully protect the edge servers used to process crowd sensing data in Internet of Things.To guarantee the network's overall security,we present a network defense resource allocation with multi-armed bandits to maximize the network's overall benefit.Firstly,we propose the method for dynamic setting of node defense resource thresholds to obtain the defender(attacker)benefit function of edge servers(nodes)and distribution.Secondly,we design a defense resource sharing mechanism for neighboring nodes to obtain the defense capability of nodes.Subsequently,we use the decomposability and Lipschitz conti-nuity of the defender's total expected utility to reduce the difference between the utility's discrete and continuous arms and analyze the difference theoretically.Finally,experimental results show that the method maximizes the defender's total expected utility and reduces the difference between the discrete and continuous arms of the utility. 展开更多
关键词 Internet of things Defense resource sharing multi-armed bandits Defense resource allocation
下载PDF
基于上下文赌博机的自适应实时车间调度
7
作者 陈鸣 王闯 许政 《计算机系统应用》 2024年第3期281-287,共7页
传统的多Agent车间调度方法使用单一调度规则,忽略了生产环境变化对调度规则适用性的影响,导致调度结果欠佳.本文针对该问题提出一种自适应实时车间调度方法,通过上下文赌博机对工件调度过程进行类比建模.经过若干回合学习的上下文赌博... 传统的多Agent车间调度方法使用单一调度规则,忽略了生产环境变化对调度规则适用性的影响,导致调度结果欠佳.本文针对该问题提出一种自适应实时车间调度方法,通过上下文赌博机对工件调度过程进行类比建模.经过若干回合学习的上下文赌博机模型能够依据生产环境制定调度决策,获得优异的调度结果.最后,通过仿真实验验证了提出方法的有效性. 展开更多
关键词 多AGENT系统 上下文赌博机 车间调度 自适应调度规则
下载PDF
Matching while Learning: Wireless Scheduling for Age of Information Optimization at the Edge 被引量:2
8
作者 Kun Guo Hao Yang +2 位作者 Peng Yang Wei Feng Tony Q.S.Quek 《China Communications》 SCIE CSCD 2023年第3期347-360,共14页
In this paper,we investigate the minimization of age of information(AoI),a metric that measures the information freshness,at the network edge with unreliable wireless communications.Particularly,we consider a set of u... In this paper,we investigate the minimization of age of information(AoI),a metric that measures the information freshness,at the network edge with unreliable wireless communications.Particularly,we consider a set of users transmitting status updates,which are collected by the user randomly over time,to an edge server through unreliable orthogonal channels.It begs a natural question:with random status update arrivals and obscure channel conditions,can we devise an intelligent scheduling policy that matches the users and channels to stabilize the queues of all users while minimizing the average AoI?To give an adequate answer,we define a bipartite graph and formulate a dynamic edge activation problem with stability constraints.Then,we propose an online matching while learning algorithm(MatL)and discuss its implementation for wireless scheduling.Finally,simulation results demonstrate that the MatL is reliable to learn the channel states and manage the users’buffers for fresher information at the edge. 展开更多
关键词 information freshness Lyapunov opti-mization multi-armed bandit wireless scheduling
下载PDF
Residential HVAC Aggregation Based on Risk-averse Multi-armed Bandit Learning for Secondary Frequency Regulation 被引量:7
9
作者 Xinyi Chen Qinran Hu +3 位作者 Qingxin Shi Xiangjun Quan Zaijun Wu Fangxing Li 《Journal of Modern Power Systems and Clean Energy》 SCIE EI CSCD 2020年第6期1160-1167,共8页
As the penetration of renewable energy continues to increase,stochastic and intermittent generation resources gradually replace the conventional generators,bringing significant challenges in stabilizing power system f... As the penetration of renewable energy continues to increase,stochastic and intermittent generation resources gradually replace the conventional generators,bringing significant challenges in stabilizing power system frequency.Thus,aggregating demand-side resources for frequency regulation attracts attentions from both academia and industry.However,in practice,conventional aggregation approaches suffer from random and uncertain behaviors of the users such as opting out control signals.The risk-averse multi-armed bandit learning approach is adopted to learn the behaviors of the users and a novel aggregation strategy is developed for residential heating,ventilation,and air conditioning(HVAC)to provide reliable secondary frequency regulation.Compared with the conventional approach,the simulation results show that the risk-averse multiarmed bandit learning approach performs better in secondary frequency regulation with fewer users being selected and opting out of the control.Besides,the proposed approach is more robust to random and changing behaviors of the users. 展开更多
关键词 HEATING ventilation and air conditioning(HVAC) load control multi-armed bandit online learning secondary frequency regulation
原文传递
Channel estimation based on multi-armed approach for maritime OFDM wireless communications
10
作者 Zhang Qianqian Xu Yanli 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2023年第4期75-85,120,共12页
With the development of maritime informatization and the increased generation of marine data,the demands of efficient and reliable maritime communication surge.However,harsh and dynamic marine communication environmen... With the development of maritime informatization and the increased generation of marine data,the demands of efficient and reliable maritime communication surge.However,harsh and dynamic marine communication environmentcan distort transmission signal,which significantly weaken the communication performance.Therefore,for maritime wireless communication system,the channel estimation is often required to detect the channel suffered from the impacts of changing factors.Since there is no universal maritime communication channel model and channel varies dynamically,channel estimation method needs to make decision dynamically without pre-knowledge of channel distribution.This paper studies the radio channel estimation problem of wireless communications over the sea surface.To improve the estimation accuracy,this paper utilizes multi-armed bandit(MAB)problem to deal with the uncertainty of channel state information(CSI),then proposes a dynamic channel estimation algorithm to explore the global changing channel information,and asymptotically minimize the estimation error.By the aid of MAB,the estimation is not only dynamic according to channel variation,but also does not need to know the channel distribution.Simulation results show that the proposed algorithm can achieve higher estimation accuracy compared to matching pursuit(MP)-based and fractional Fourier transform(FrFT)-based methods. 展开更多
关键词 MARITIME WIRELESS COMMUNICATIONS channel estimation multi-armed bandit
原文传递
一种自适应的多臂赌博机算法 被引量:8
11
作者 章晓芳 周倩 +1 位作者 梁斌 徐进 《计算机研究与发展》 EI CSCD 北大核心 2019年第3期643-654,共12页
多臂赌博机问题是强化学习中研究探索和利用两者平衡的经典问题,其中,随机多臂赌博机问题是最经典的一类多臂赌博机问题,是众多新型多臂赌博机问题的基础.针对现有多臂赌博机算法未能充分使用环境反馈信息以及泛化能力较弱的问题,提出... 多臂赌博机问题是强化学习中研究探索和利用两者平衡的经典问题,其中,随机多臂赌博机问题是最经典的一类多臂赌博机问题,是众多新型多臂赌博机问题的基础.针对现有多臂赌博机算法未能充分使用环境反馈信息以及泛化能力较弱的问题,提出一种自适应的多臂赌博机算法.该算法利用当前估计值最小的动作被选择的次数来调整探索和利用的概率(chosen number of arm with minimal estimation, CNAME),有效缓解了探索和利用不平衡的问题.同时,该算法不依赖于上下文信息,在不同场景的多臂赌博机问题中有更好的泛化能力.通过理论分析给出了该算法的悔值(regret)上界,并通过不同场景的实验结果表明:CNAME算法可以高效地获得较高的奖赏和较低的悔值,并且具有更好的泛化能力. 展开更多
关键词 强化学习 多臂赌博机 探索和利用 自适应 上下文相关
下载PDF
一种核的上下文多臂赌博机推荐算法 被引量:2
12
作者 王鼎 门昌骞 王文剑 《智能系统学报》 CSCD 北大核心 2022年第3期625-633,共9页
个性化推荐服务在当今互联网时代越来越重要,但是传统推荐算法不适应一些高度变化场景。将线性上下文多臂赌博机算法(linear upper confidence bound,LinUCB)应用于个性化推荐可以有效改善传统推荐算法存在的问题,但遗憾的是准确率并不... 个性化推荐服务在当今互联网时代越来越重要,但是传统推荐算法不适应一些高度变化场景。将线性上下文多臂赌博机算法(linear upper confidence bound,LinUCB)应用于个性化推荐可以有效改善传统推荐算法存在的问题,但遗憾的是准确率并不是很高。本文针对LinUCB算法推荐准确率不高这一问题,提出了一种改进算法K-UCB(kernel upper confidence bound)。该算法突破了LinUCB算法中不合理的线性假设前提,利用核方法拟合预测收益与上下文间的非线性关系,得到了一种新的在非线性数据下计算预测收益置信区间上界的方法,以解决推荐过程中的探索–利用困境。实验表明,本文提出的K-UCB算法相比其他基于多臂赌博机推荐算法有更高的点击率(click-through rate,CTR),能更好地适应变化场景下个性化推荐的需求。 展开更多
关键词 个性化推荐 变化场景 多臂赌博机 线性上下文多臂赌博机 核方法 点击率 非线性 探索–利用困境
下载PDF
在线学习方法综述:汤普森抽样和其他方法 被引量:6
13
作者 何斯迈 金羽佳 +1 位作者 王华 葛冬冬 《运筹学学报》 CSCD 北大核心 2017年第4期84-102,共19页
本文尝试对在线学习领域的最新研究成果、相关主要理论和算法进行综述.在线学习的内容非常广博,本文希望能够为读者介绍其中一些基本的算法和想法,从最经典的理论模型和算法设计开始,对在线学习的发展情况作一个一般性的介绍.首先,以经... 本文尝试对在线学习领域的最新研究成果、相关主要理论和算法进行综述.在线学习的内容非常广博,本文希望能够为读者介绍其中一些基本的算法和想法,从最经典的理论模型和算法设计开始,对在线学习的发展情况作一个一般性的介绍.首先,以经典的在线优化模型——多摇臂赌博机问题为例,引入了汤普森抽样算法和信心上界算法,分析、展示了它们的基本思路和最新成果,并进一步讨论了汤普森抽样算法在更复杂的在线学习问题中的变式和应用.本文同时对在线凸优化算法做了初步探讨,它也是解决多摇臂赌博机问题和其他许多在线学习的应用问题时一种强有力的工具. 展开更多
关键词 在线学习 多摇臂赌博机 汤普森抽样 信心上界算法 情境多摇臂赌博机 在线凸优化
下载PDF
Age of Transmission-Optimal Scheduling for State Update of Multi-Antenna Cellular Internet of Things 被引量:1
14
作者 Song Li Min Li +1 位作者 Ruirui Chen Yanjing Sun 《China Communications》 SCIE CSCD 2022年第4期302-314,共13页
Timely information updates are critical for real-time monitoring and control applications in the Internet of Things(IoT). In this paper, we consider a multi-antenna cellular IoT for state update where a base station(B... Timely information updates are critical for real-time monitoring and control applications in the Internet of Things(IoT). In this paper, we consider a multi-antenna cellular IoT for state update where a base station(BS) collects information from randomly distributed IoT nodes through time-varying channel.Specifically, multiple IoT nodes are allowed to transmit their state update simultaneously in a spatial multiplex manner. Inspired by age of information(AoI),we introduce a novel concept of age of transmission(AoT) for the sceneries in which BS cannot obtain the generation time of the packets waiting to be transmitted. The deadline-constrained AoT-optimal scheduling problem is formulated as a restless multi-armed bandit(RMAB) problem. Firstly, we prove the indexability of the scheduling problem and derive the closed-form of the Whittle index. Then, the interference graph and complementary graph are constructed to illustrate the interference between two nodes. The complete subgraphs are detected in the complementary graph to avoid inter-node interference. Next, an AoT-optimal scheduling strategy based on the Whittle index and complete subgraph detection is proposed.Finally, numerous simulations are conducted to verify the performance of the proposed strategy. 展开更多
关键词 age of transmission information freshness cellular IoT restless multi-armed bandit Whittle index
下载PDF
Optimal index shooting policy for layered missile defense system 被引量:1
15
作者 LI Longyue FAN Chengli +2 位作者 XING Qinghua XU Hailong ZHAO Huizhen 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2020年第1期118-129,共12页
In order to cope with the increasing threat of the ballistic missile(BM)in a shorter reaction time,the shooting policy of the layered defense system needs to be optimized.The main decisionmaking problem of shooting op... In order to cope with the increasing threat of the ballistic missile(BM)in a shorter reaction time,the shooting policy of the layered defense system needs to be optimized.The main decisionmaking problem of shooting optimization is how to choose the next BM which needs to be shot according to the previous engagements and results,thus maximizing the expected return of BMs killed or minimizing the cost of BMs penetration.Motivated by this,this study aims to determine an optimal shooting policy for a two-layer missile defense(TLMD)system.This paper considers a scenario in which the TLMD system wishes to shoot at a collection of BMs one at a time,and to maximize the return obtained from BMs killed before the system demise.To provide a policy analysis tool,this paper develops a general model for shooting decision-making,the shooting engagements can be described as a discounted reward Markov decision process.The index shooting policy is a strategy that can effectively balance the shooting returns and the risk that the defense mission fails,and the goal is to maximize the return obtained from BMs killed before the system demise.The numerical results show that the index policy is better than a range of competitors,especially the mean returns and the mean killing BM number. 展开更多
关键词 Gittins index shooting policy layered missile defense multi-armed bandits problem Markov decision process
下载PDF
面向LinUCB算法的数据投毒攻击方法
16
作者 姜伟龙 何琨 《中国科学:信息科学》 CSCD 北大核心 2024年第7期1569-1587,共19页
LinUCB算法是求解上下文多臂老虎机问题的一种典型算法,被广泛应用于新闻投放、产品推荐、医疗资源分配等场景中.目前对该算法的安全性研究略显薄弱,这就要求研究者进一步加深对该算法的攻击方式的研究,以作出具有针对性乃至泛用性的防... LinUCB算法是求解上下文多臂老虎机问题的一种典型算法,被广泛应用于新闻投放、产品推荐、医疗资源分配等场景中.目前对该算法的安全性研究略显薄弱,这就要求研究者进一步加深对该算法的攻击方式的研究,以作出具有针对性乃至泛用性的防御措施.本文提出了两种通过添加虚假数据的方式对LinUCB算法进行离线数据投毒攻击的攻击方案,即TCA方案(target context attack)与OCA方案(optimized context attack).前者是基于训练数据与目标上下文的相似性来生成投毒数据的;后者是建模一个优化问题,通过求解该问题来构造投毒数据,是前者的优化版本.实验测试表明,仅需添加少量投毒数据作为攻击成本即可实现对攻击目标的100%攻击成功率. 展开更多
关键词 上下文多臂老虎机 LinUCB算法 数据投毒攻击 白盒攻击 优化问题
原文传递
Age-of-Information-Aware Federated Learning
17
作者 徐殷 肖明军 +3 位作者 吴晨 吴杰 周津锐 孙贺 《Journal of Computer Science & Technology》 SCIE EI CSCD 2024年第3期637-653,共17页
Federated learning(FL)is an emerging privacy-preserving distributed computing paradigm,enabling numerous clients to collaboratively train machine learning models without the necessity of transmitting clients’private ... Federated learning(FL)is an emerging privacy-preserving distributed computing paradigm,enabling numerous clients to collaboratively train machine learning models without the necessity of transmitting clients’private datasets to the central server.Unlike most existing research where the local datasets of clients are assumed to be unchanged over time throughout the whole FL process,our study addresses such scenarios in this paper where clients’datasets need to be updated periodically,and the server can incentivize clients to employ as fresh as possible datasets for local model training.Our primary objective is to design a client selection strategy to minimize the loss of the global model for FL loss within a constrained budget.To this end,we introduce the concept of“Age of Information”(AoI)to quantitatively assess the freshness of local datasets and conduct a theoretical analysis of the convergence bound in our AoI-aware FL system.Based on the convergence bound,we further formulate our problem as a restless multi-armed bandit(RMAB)problem.Next,we relax the RMAB problem and apply the Lagrangian Dual approach to decouple it into multiple subproblems.Finally,we propose a Whittle’s Index Based Client Selection(WICS)algorithm to determine the set of selected clients.In addition,comprehensive simulations substantiate that the proposed algorithm can effectively reduce training loss and enhance the learning accuracy compared with some state-of-the-art methods. 展开更多
关键词 federated learning Age of Information restless multi-armed bandit Whittle’s index
原文传递
A Novel Cooperative Multi-Stage Hyper-Heuristic for Combination Optimization Problems 被引量:10
18
作者 Fuqing Zhao Shilu Di +2 位作者 Jie Cao Jianxin Tang Jonrinaldi 《Complex System Modeling and Simulation》 2021年第2期91-108,共18页
A hyper-heuristic algorithm is a general solution framework that adaptively selects the optimizer to address complex problems.A classical hyper-heuristic framework consists of two levels,including the high-level heuri... A hyper-heuristic algorithm is a general solution framework that adaptively selects the optimizer to address complex problems.A classical hyper-heuristic framework consists of two levels,including the high-level heuristic and a set of low-level heuristics.The low-level heuristics to be used in the optimization process are chosen by the high-level tactics in the hyper-heuristic.In this study,a Cooperative Multi-Stage Hyper-Heuristic(CMS-HH)algorithm is proposed to address certain combinatorial optimization problems.In the CMS-HH,a genetic algorithm is introduced to perturb the initial solution to increase the diversity of the solution.In the search phase,an online learning mechanism based on the multi-armed bandits and relay hybridization technology are proposed to improve the quality of the solution.In addition,a multi-point search is introduced to cooperatively search with a single-point search when the state of the solution does not change in continuous time.The performance of the CMS-HH algorithm is assessed in six specific combinatorial optimization problems,including Boolean satisfiability problems,one-dimensional packing problems,permutation flow-shop scheduling problems,personnel scheduling problems,traveling salesman problems,and vehicle routing problems.The experimental results demonstrate the efficiency and significance of the proposed CMS-HH algorithm. 展开更多
关键词 hyper-heuristic algorithm multi-armed bandits(MAB) relay hybridization technology combinatorial optimization
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部