期刊文献+
共找到24篇文章
< 1 2 >
每页显示 20 50 100
Starlet:Network defense resource allocation with multi-armed bandits for cloud-edge crowd sensing in IoT
1
作者 Hui Xia Ning Huang +2 位作者 Xuecai Feng Rui Zhang Chao Liu 《Digital Communications and Networks》 SCIE CSCD 2024年第3期586-596,共11页
The cloud platform has limited defense resources to fully protect the edge servers used to process crowd sensing data in Internet of Things.To guarantee the network's overall security,we present a network defense ... The cloud platform has limited defense resources to fully protect the edge servers used to process crowd sensing data in Internet of Things.To guarantee the network's overall security,we present a network defense resource allocation with multi-armed bandits to maximize the network's overall benefit.Firstly,we propose the method for dynamic setting of node defense resource thresholds to obtain the defender(attacker)benefit function of edge servers(nodes)and distribution.Secondly,we design a defense resource sharing mechanism for neighboring nodes to obtain the defense capability of nodes.Subsequently,we use the decomposability and Lipschitz conti-nuity of the defender's total expected utility to reduce the difference between the utility's discrete and continuous arms and analyze the difference theoretically.Finally,experimental results show that the method maximizes the defender's total expected utility and reduces the difference between the discrete and continuous arms of the utility. 展开更多
关键词 Internet of things Defense resource sharing multi-armed bandits Defense resource allocation
下载PDF
Distributed Weighted Data Aggregation Algorithm in End-to-Edge Communication Networks Based on Multi-armed Bandit 被引量:1
2
作者 Yifei ZOU Senmao QI +1 位作者 Cong'an XU Dongxiao YU 《计算机科学》 CSCD 北大核心 2023年第2期13-22,共10页
As a combination of edge computing and artificial intelligence,edge intelligence has become a promising technique and provided its users with a series of fast,precise,and customized services.In edge intelligence,when ... As a combination of edge computing and artificial intelligence,edge intelligence has become a promising technique and provided its users with a series of fast,precise,and customized services.In edge intelligence,when learning agents are deployed on the edge side,the data aggregation from the end side to the designated edge devices is an important research topic.Considering the various importance of end devices,this paper studies the weighted data aggregation problem in a single hop end-to-edge communication network.Firstly,to make sure all the end devices with various weights are fairly treated in data aggregation,a distributed end-to-edge cooperative scheme is proposed.Then,to handle the massive contention on the wireless channel caused by end devices,a multi-armed bandit(MAB)algorithm is designed to help the end devices find their most appropriate update rates.Diffe-rent from the traditional data aggregation works,combining the MAB enables our algorithm a higher efficiency in data aggregation.With a theoretical analysis,we show that the efficiency of our algorithm is asymptotically optimal.Comparative experiments with previous works are also conducted to show the strength of our algorithm. 展开更多
关键词 Weighted data aggregation End-to-edge communication multi-armed bandit Edge intelligence
下载PDF
Strict greedy design paradigm applied to the stochastic multi-armed bandit problem
3
作者 Joey Hong 《机床与液压》 北大核心 2015年第6期1-6,共6页
The process of making decisions is something humans do inherently and routinely,to the extent that it appears commonplace. However,in order to achieve good overall performance,decisions must take into account both the... The process of making decisions is something humans do inherently and routinely,to the extent that it appears commonplace. However,in order to achieve good overall performance,decisions must take into account both the outcomes of past decisions and opportunities of future ones. Reinforcement learning,which is fundamental to sequential decision-making,consists of the following components: 1 A set of decisions epochs; 2 A set of environment states; 3 A set of available actions to transition states; 4 State-action dependent immediate rewards for each action.At each decision,the environment state provides the decision maker with a set of available actions from which to choose. As a result of selecting a particular action in the state,the environment generates an immediate reward for the decision maker and shifts to a different state and decision. The ultimate goal for the decision maker is to maximize the total reward after a sequence of time steps.This paper will focus on an archetypal example of reinforcement learning,the stochastic multi-armed bandit problem. After introducing the dilemma,I will briefly cover the most common methods used to solve it,namely the UCB and εn- greedy algorithms. I will also introduce my own greedy implementation,the strict-greedy algorithm,which more tightly follows the greedy pattern in algorithm design,and show that it runs comparably to the two accepted algorithms. 展开更多
关键词 Greedy algorithms Allocation strategy Stochastic multi-armed bandit problem
下载PDF
Training a Quantum Neural Network to Solve the Contextual Multi-Armed Bandit Problem
4
作者 Wei Hu James Hu 《Natural Science》 2019年第1期17-27,共11页
Artificial intelligence has permeated all aspects of our lives today. However, to make AI behave like real AI, the critical bottleneck lies in the speed of computing. Quantum computers employ the peculiar and unique p... Artificial intelligence has permeated all aspects of our lives today. However, to make AI behave like real AI, the critical bottleneck lies in the speed of computing. Quantum computers employ the peculiar and unique properties of quantum states such as superposition, entanglement, and interference to process information in ways that classical computers cannot. As a new paradigm of computation, quantum computers are capable of performing tasks intractable for classical processors, thus providing a quantum leap in AI research and making the development of real AI a possibility. In this regard, quantum machine learning not only enhances the classical machine learning approach but more importantly it provides an avenue to explore new machine learning models that have no classical counterparts. The qubit-based quantum computers cannot naturally represent the continuous variables commonly used in machine learning, since the measurement outputs of qubit-based circuits are generally discrete. Therefore, a continuous-variable (CV) quantum architecture based on a photonic quantum computing model is selected for our study. In this work, we employ machine learning and optimization to create photonic quantum circuits that can solve the contextual multi-armed bandit problem, a problem in the domain of reinforcement learning, which demonstrates that quantum reinforcement learning algorithms can be learned by a quantum device. 展开更多
关键词 Continuous-Variable QUANTUM COMPUTERS QUANTUM Machine LEARNING QUANTUM Reinforcement LEARNING CONTEXTUAL multi-armed bandit PROBLEM
下载PDF
Stochastic programming based multi-arm bandit offloading strategy for internet of things
5
作者 Bin Cao Tingyong Wu Xiang Bai 《Digital Communications and Networks》 SCIE CSCD 2023年第5期1200-1211,共12页
In order to solve the high latency of traditional cloud computing and the processing capacity limitation of Internet of Things(IoT)users,Multi-access Edge Computing(MEC)migrates computing and storage capabilities from... In order to solve the high latency of traditional cloud computing and the processing capacity limitation of Internet of Things(IoT)users,Multi-access Edge Computing(MEC)migrates computing and storage capabilities from the remote data center to the edge of network,providing users with computation services quickly and directly.In this paper,we investigate the impact of the randomness caused by the movement of the IoT user on decision-making for offloading,where the connection between the IoT user and the MEC servers is uncertain.This uncertainty would be the main obstacle to assign the task accurately.Consequently,if the assigned task cannot match well with the real connection time,a migration(connection time is not enough to process)would be caused.In order to address the impact of this uncertainty,we formulate the offloading decision as an optimization problem considering the transmission,computation and migration.With the help of Stochastic Programming(SP),we use the posteriori recourse to compensate for inaccurate predictions.Meanwhile,in heterogeneous networks,considering multiple candidate MEC servers could be selected simultaneously due to overlapping,we also introduce the Multi-Arm Bandit(MAB)theory for MEC selection.The extensive simulations validate the improvement and effectiveness of the proposed SP-based Multi-arm bandit Method(SMM)for offloading in terms of reward,cost,energy consumption and delay.The results showthat SMMcan achieve about 20%improvement compared with the traditional offloading method that does not consider the randomness,and it also outperforms the existing SP/MAB based method for offloading. 展开更多
关键词 Multi-access computing Internet of things OFFLOADING Stochastic programming multi-arm bandit
下载PDF
MOOB:一种改进的基于Bandit模型的推荐算法 被引量:1
6
作者 帖军 孙荣苑 +1 位作者 孙翀 郑禄 《中南民族大学学报(自然科学版)》 CAS 2018年第1期114-119,共6页
提出了一种基于置信区间上界算法的多目标优化推荐算法.该算法可以在保证预测精准度的基础上有效地避免马太效应,并提高推荐系统对长尾物品的挖掘能力.采用Ya Hoo的新闻推荐数据集对算法进行了实验和评价,实验结果表明:多目标优化推荐... 提出了一种基于置信区间上界算法的多目标优化推荐算法.该算法可以在保证预测精准度的基础上有效地避免马太效应,并提高推荐系统对长尾物品的挖掘能力.采用Ya Hoo的新闻推荐数据集对算法进行了实验和评价,实验结果表明:多目标优化推荐算法能够在预测准确率较高的情况下,有效地解决长尾物品发掘问题,避免马太效应,提高推荐系统的精度和广度. 展开更多
关键词 bandit模型 马太效应 长尾现象 多目标优化 覆盖率
下载PDF
Residential HVAC Aggregation Based on Risk-averse Multi-armed Bandit Learning for Secondary Frequency Regulation 被引量:7
7
作者 Xinyi Chen Qinran Hu +3 位作者 Qingxin Shi Xiangjun Quan Zaijun Wu Fangxing Li 《Journal of Modern Power Systems and Clean Energy》 SCIE EI CSCD 2020年第6期1160-1167,共8页
As the penetration of renewable energy continues to increase,stochastic and intermittent generation resources gradually replace the conventional generators,bringing significant challenges in stabilizing power system f... As the penetration of renewable energy continues to increase,stochastic and intermittent generation resources gradually replace the conventional generators,bringing significant challenges in stabilizing power system frequency.Thus,aggregating demand-side resources for frequency regulation attracts attentions from both academia and industry.However,in practice,conventional aggregation approaches suffer from random and uncertain behaviors of the users such as opting out control signals.The risk-averse multi-armed bandit learning approach is adopted to learn the behaviors of the users and a novel aggregation strategy is developed for residential heating,ventilation,and air conditioning(HVAC)to provide reliable secondary frequency regulation.Compared with the conventional approach,the simulation results show that the risk-averse multiarmed bandit learning approach performs better in secondary frequency regulation with fewer users being selected and opting out of the control.Besides,the proposed approach is more robust to random and changing behaviors of the users. 展开更多
关键词 HEATING ventilation and air conditioning(HVAC) load control multi-armed bandit online learning secondary frequency regulation
原文传递
面向异构ICN节点的副本选择算法研究
8
作者 高雷 朱小勇 《网络新媒体技术》 2024年第4期26-34,共9页
信息中心网络(ICN)是一种革新式网络架构,打破了传统TCP/IP网络端到端传输的限制,提升内容分发效率。ICN构建全网规模的缓存系统,在网络内采用多副本冗余的方式缓存数据内容,以便用户就近获取。与传统互联网缓存系统不同,ICN的缓存呈现... 信息中心网络(ICN)是一种革新式网络架构,打破了传统TCP/IP网络端到端传输的限制,提升内容分发效率。ICN构建全网规模的缓存系统,在网络内采用多副本冗余的方式缓存数据内容,以便用户就近获取。与传统互联网缓存系统不同,ICN的缓存呈现泛在化的特点,工作设备是网络基础设施,导致服务资源的异构性普遍存在。在这种环境下,选择适当的副本节点成为重要研究问题。本文首先通过M/M/1排队模型对异构ICN节点进行抽象建模和分析,然后将异构副本节点的选择建模成多臂老虎机问题,继而引入UCB1算法来探索并学习最优决策。仿真实验结果表明,该算法在提高缓存服务可靠性和缩短内容获取时延方面具有明显优势,算法使服务可靠性达到99.15%,将内容获取的平均时延最大缩短8.63%。 展开更多
关键词 信息中心网络 网内缓存 副本选择 M/M/1 排队模型 多臂老虎机问题
下载PDF
Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs
9
作者 Yifan Lin Yuhao Wang Enlu Zhou 《Journal of Systems Science and Systems Engineering》 SCIE EI CSCD 2023年第3期267-288,共22页
In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion.At each round,contexts are revealed for each arm,and the decision maker chooses one arm to pull and ... In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion.At each round,contexts are revealed for each arm,and the decision maker chooses one arm to pull and receives the corresponding reward.In particular,we consider mean-variance as the risk criterion,and the best arm is the one with the largest mean-variance reward.We apply the Thompson sampling algorithm for the disjoint model,and provide a comprehensive regret analysis for a variant of the proposed algorithm.For T rounds,K actions,and d-dimensional feature vectors,we prove a regret bound of O((1+ρ+1/ρ)d In T ln K/δ√dKT^(1+2∈)ln K/δ1/e)that holds with probability 1-δunder the mean-variance criterion with risk tolerance p,for any 0<ε<1/2,0<δ<1.The empirical performance of our proposed algorithms is demonstrated via a portfolio selection problem. 展开更多
关键词 multi-armed bandit CONTEXT RISK-AVERSE Thompson sampling
原文传递
认知无线网络中基于无休止多臂赌博机模型的多用户频谱接入机制 被引量:5
10
作者 朱江 韩超 +1 位作者 杨浩磊 彭著勋 《计算机应用》 CSCD 北大核心 2014年第10期2782-2786,共5页
针对如何协调多个认知用户择机接入多段空闲频域信道的问题,提出了一种基于无休止多臂赌博机(RMAB)模型的动态频谱接入机制。首先,考虑到实际环境下认知用户的信道感知误差,推导出能有效处理感知误差的Whittle索引值算法,该算法通过历... 针对如何协调多个认知用户择机接入多段空闲频域信道的问题,提出了一种基于无休止多臂赌博机(RMAB)模型的动态频谱接入机制。首先,考虑到实际环境下认知用户的信道感知误差,推导出能有效处理感知误差的Whittle索引值算法,该算法通过历史经验积累给予每个信道一定的信任值,并综合考虑在当前信任值下选择每个信道的立即收益与未来收益的多少,选择出需要感知接入的信道;其次,对于多个认知用户接入相同信道时产生冲突的问题,提出了基于多标拍卖的协调机制,通过多标拍卖的方式处理认知用户之间的冲突。仿真结果表明,在相同的环境中,所提出的频谱接入机制与未处理误差的或者未采用多标拍卖的接入机制相比,认知用户获得的吞吐量更大。 展开更多
关键词 多用户多信道 无休止多臂赌博机模型 多标拍卖 Whittle索引值算法
下载PDF
针对新用户冷启动问题的改进Epsilon-greedy算法 被引量:1
11
作者 王素琴 张洋 +1 位作者 蒋浩 朱登明 《计算机工程》 CAS CSCD 北大核心 2018年第11期172-177,共6页
在解决新用户冷启动问题时,固定不变的Epsilon参数会使传统Epsilon-greedy算法收敛缓慢。为此,提出一种改进的Epsilon-greedy算法。利用免疫反馈模型动态调整Epsilon参数,从而使算法快速收敛。使用蒙特卡罗模拟方法对算法进行实验验证,... 在解决新用户冷启动问题时,固定不变的Epsilon参数会使传统Epsilon-greedy算法收敛缓慢。为此,提出一种改进的Epsilon-greedy算法。利用免疫反馈模型动态调整Epsilon参数,从而使算法快速收敛。使用蒙特卡罗模拟方法对算法进行实验验证,结果表明,该算法能够在用户与推荐系统交互较少的情况下为用户进行有效推荐,且推荐效果优于传统的Epsilon-greedy、Softmax和UCB算法。 展开更多
关键词 推荐系统 冷启动 Epsilon-greedy算法 免疫反馈模型 bandit算法
下载PDF
基于多臂赌博机模型的信道选择 被引量:4
12
作者 朱江 陈红翠 熊加毫 《电讯技术》 北大核心 2015年第10期1094-1100,共7页
在择机频谱接入系统中,为解决未知信道环境先验知识下的信道选择问题,提出了一种基于多臂赌博机(MAB)模型的改进UCB(Upper Confidence Bound)索引选择策略。该策略是通过在UCB索引的置信因子中引入收益方差值来调整对未知信道环境的探... 在择机频谱接入系统中,为解决未知信道环境先验知识下的信道选择问题,提出了一种基于多臂赌博机(MAB)模型的改进UCB(Upper Confidence Bound)索引选择策略。该策略是通过在UCB索引的置信因子中引入收益方差值来调整对未知信道环境的探索过程,以降低探索成本。结合理论证明了本策略有较快的收敛速度,还证明了本策略下的学习后悔值曲线与时隙呈近似对数关系而较缓慢增长。仿真结果表明,与原UCB策略以及贪心算法相比,所提策略更能自适应地选择可用性较好的信道,有效降低学习后悔值并加快其收敛速度,从而提高了系统吞吐量。 展开更多
关键词 认知无线电 择机频谱接入 信道选择 多臂赌博机模型 UCB索引
下载PDF
基于Whittle索引的RFID多阅读器信道资源分配算法 被引量:5
13
作者 石静 郑嘉利 +2 位作者 袁源 王哲 李丽 《计算机科学》 CSCD 北大核心 2019年第10期122-127,共6页
针对无线射频识别(RFID)系统中多标签-多阅读器环境下标签与信道资源的分配问题,提出了一种基于Whittle索引的多阅读器信道资源分配算法。在RFID多阅读器信道分配问题中建立无休止多臂赌博机(RMAB)模型,并采用Whittle索引算法进行求解... 针对无线射频识别(RFID)系统中多标签-多阅读器环境下标签与信道资源的分配问题,提出了一种基于Whittle索引的多阅读器信道资源分配算法。在RFID多阅读器信道分配问题中建立无休止多臂赌博机(RMAB)模型,并采用Whittle索引算法进行求解。该算法依据信道前期的忙、闲状态,将信道空闲概率作为信任值赋予每个信道,并根据信道当前的信任值计算其Whittle索引值。标签选择索引值最大的信道作为可能感知接入的信道,随后根据每个时隙数据发送成功与否来动态更新信道信任值。对信道分配过程中可能出现的标签碰撞问题,采用等待一个时隙后再根据识别反馈信息重新选择接入信道的方式来解决。将文中所提算法从两个方面与典型的DiCa算法和Gentle算法进行比较:一是在阅读器数量固定的前提下,其系统吞吐量随待识别标签数量的变化情况;二是在待识别标签数量固定的前提下,其系统吞吐量随阅读器数量的变化情况。仿真结果表明,所提算法在上述两种情况下的系统吞吐量均优于DiCa算法和Gentle算法,其吞吐量在阅读器数量固定的前提下分别平均提高了150.34%和23.98%,在待识别标签数量固定的前提下分别平均提高了205.01%和43.37%。随着阅读器和待识别标签数量的增多,所提算法在系统吞吐量方面的优势更加明显。因此,采用提出的算法可以对有限的信道资源进行合理的动态分配,有效提高RFID多阅读器系统的识别效率。 展开更多
关键词 无线射频识别 多标签-多阅读器 无休止多臂赌博机模型 Whittle索引算法
下载PDF
在线影响力最大化研究综述 被引量:10
14
作者 孔芳 李奇之 李帅 《计算机科学》 CSCD 北大核心 2020年第5期7-13,共7页
影响力最大化是指在给定的影响力传播模型下选取种子节点使其传播信息范围最广。此问题的应用场景十分广泛,包括推荐系统、病毒营销、信息扩散和链接预测等。在实际应用中,信息传播模型中的点对点传播概率通常是未知的,而在线学习算法... 影响力最大化是指在给定的影响力传播模型下选取种子节点使其传播信息范围最广。此问题的应用场景十分广泛,包括推荐系统、病毒营销、信息扩散和链接预测等。在实际应用中,信息传播模型中的点对点传播概率通常是未知的,而在线学习算法可以在交互过程中自主学习未知参数,逐步逼近最优解。文中首先讨论了影响力最大化问题的定义,介绍了常用的影响力传播模型,归纳了常见的离线影响力最大化算法;随后介绍了经典的在线学习框架——多臂老虎机问题,分析了在线影响力最大化问题的研究现状,并通过实验对常见的在线影响力最大化算法在真实社交网络中的性能表现进行对比;最后总结了该课题面临的挑战并展望了未来的研究方向。 展开更多
关键词 影响力传播模型 影响力最大化 社交网络 在线学习算法 多臂老虎机
下载PDF
用于空间众包任务匹配的未知工人效用估计方法 被引量:1
15
作者 王亦敬 陈荣 +2 位作者 郭世凯 于茜 张程伟 《郑州大学学报(理学版)》 北大核心 2021年第3期65-71,共7页
为了解决在未知工人效用情况下提高任务完成质量的问题,提出带有效用的最高分数匹配模型。该模型包含两个阶段:阶段一利用多臂老虎机模型计算工人效用值;阶段二利用更改了加分规则后的带有效用的基本方法(U-Basic)、带有效用的最小位置... 为了解决在未知工人效用情况下提高任务完成质量的问题,提出带有效用的最高分数匹配模型。该模型包含两个阶段:阶段一利用多臂老虎机模型计算工人效用值;阶段二利用更改了加分规则后的带有效用的基本方法(U-Basic)、带有效用的最小位置熵方法(U-LLEP)、带有效用的近距离优先方法(U-CDP)进行分配。在MovieLens和Gowalla真实世界数据集上的实验结果表明,所提方法与未使用效用的CDP和LLEP方法相比,一些评价指标有较大提升。 展开更多
关键词 空间众包 任务匹配 多臂老虎机模型 加权二分图匹配
下载PDF
Matching while Learning: Wireless Scheduling for Age of Information Optimization at the Edge 被引量:3
16
作者 Kun Guo Hao Yang +2 位作者 Peng Yang Wei Feng Tony Q.S.Quek 《China Communications》 SCIE CSCD 2023年第3期347-360,共14页
In this paper,we investigate the minimization of age of information(AoI),a metric that measures the information freshness,at the network edge with unreliable wireless communications.Particularly,we consider a set of u... In this paper,we investigate the minimization of age of information(AoI),a metric that measures the information freshness,at the network edge with unreliable wireless communications.Particularly,we consider a set of users transmitting status updates,which are collected by the user randomly over time,to an edge server through unreliable orthogonal channels.It begs a natural question:with random status update arrivals and obscure channel conditions,can we devise an intelligent scheduling policy that matches the users and channels to stabilize the queues of all users while minimizing the average AoI?To give an adequate answer,we define a bipartite graph and formulate a dynamic edge activation problem with stability constraints.Then,we propose an online matching while learning algorithm(MatL)and discuss its implementation for wireless scheduling.Finally,simulation results demonstrate that the MatL is reliable to learn the channel states and manage the users’buffers for fresher information at the edge. 展开更多
关键词 information freshness Lyapunov opti-mization multi-armed bandit wireless scheduling
下载PDF
Age of Transmission-Optimal Scheduling for State Update of Multi-Antenna Cellular Internet of Things 被引量:1
17
作者 Song Li Min Li +1 位作者 Ruirui Chen Yanjing Sun 《China Communications》 SCIE CSCD 2022年第4期302-314,共13页
Timely information updates are critical for real-time monitoring and control applications in the Internet of Things(IoT). In this paper, we consider a multi-antenna cellular IoT for state update where a base station(B... Timely information updates are critical for real-time monitoring and control applications in the Internet of Things(IoT). In this paper, we consider a multi-antenna cellular IoT for state update where a base station(BS) collects information from randomly distributed IoT nodes through time-varying channel.Specifically, multiple IoT nodes are allowed to transmit their state update simultaneously in a spatial multiplex manner. Inspired by age of information(AoI),we introduce a novel concept of age of transmission(AoT) for the sceneries in which BS cannot obtain the generation time of the packets waiting to be transmitted. The deadline-constrained AoT-optimal scheduling problem is formulated as a restless multi-armed bandit(RMAB) problem. Firstly, we prove the indexability of the scheduling problem and derive the closed-form of the Whittle index. Then, the interference graph and complementary graph are constructed to illustrate the interference between two nodes. The complete subgraphs are detected in the complementary graph to avoid inter-node interference. Next, an AoT-optimal scheduling strategy based on the Whittle index and complete subgraph detection is proposed.Finally, numerous simulations are conducted to verify the performance of the proposed strategy. 展开更多
关键词 age of transmission information freshness cellular IoT restless multi-armed bandit Whittle index
下载PDF
Optimal index shooting policy for layered missile defense system 被引量:1
18
作者 LI Longyue FAN Chengli +2 位作者 XING Qinghua XU Hailong ZHAO Huizhen 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2020年第1期118-129,共12页
In order to cope with the increasing threat of the ballistic missile(BM)in a shorter reaction time,the shooting policy of the layered defense system needs to be optimized.The main decisionmaking problem of shooting op... In order to cope with the increasing threat of the ballistic missile(BM)in a shorter reaction time,the shooting policy of the layered defense system needs to be optimized.The main decisionmaking problem of shooting optimization is how to choose the next BM which needs to be shot according to the previous engagements and results,thus maximizing the expected return of BMs killed or minimizing the cost of BMs penetration.Motivated by this,this study aims to determine an optimal shooting policy for a two-layer missile defense(TLMD)system.This paper considers a scenario in which the TLMD system wishes to shoot at a collection of BMs one at a time,and to maximize the return obtained from BMs killed before the system demise.To provide a policy analysis tool,this paper develops a general model for shooting decision-making,the shooting engagements can be described as a discounted reward Markov decision process.The index shooting policy is a strategy that can effectively balance the shooting returns and the risk that the defense mission fails,and the goal is to maximize the return obtained from BMs killed before the system demise.The numerical results show that the index policy is better than a range of competitors,especially the mean returns and the mean killing BM number. 展开更多
关键词 Gittins index shooting policy layered missile defense multi-armed bandits problem Markov decision process
下载PDF
高速铁路场景中基于MAB模型的多信道选择算法
19
作者 朱豪 彭艺 +1 位作者 张申 李启骞 《吉林大学学报(理学版)》 CAS 北大核心 2021年第2期365-371,共7页
针对高速列车越区切换过程中,切换带为多信道分布的问题,提出一种基于多臂老虎机(multi-armed bandit,MAB)模型的信道选择算法.首先,以置信区间上界(UCB)算法为基础,通过设置信道空闲差异因子,使算法快速收敛于最优信道;其次,通过引入... 针对高速列车越区切换过程中,切换带为多信道分布的问题,提出一种基于多臂老虎机(multi-armed bandit,MAB)模型的信道选择算法.首先,以置信区间上界(UCB)算法为基础,通过设置信道空闲差异因子,使算法快速收敛于最优信道;其次,通过引入满意通信概率(SCP),衡量移动列车的通信质量,分析与切换过程中误码率之间的关系;最后,利用最优信道选择比率、成功传输率和累积接入损失作为评判标准,分析算法的性能.仿真结果表明,该算法的累积接入损失比原始UCB算法减少了约18.5%;对比随机选择算法与原始UCB算法,成功传输率提高了约30.2%和3.3%;最优选择比率提高了约88.3%和13.5%. 展开更多
关键词 越区切换 多臂老虎机模型 置信区间上界算法 满意通信概率
下载PDF
Channel estimation based on multi-armed approach for maritime OFDM wireless communications
20
作者 Zhang Qianqian Xu Yanli 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2023年第4期75-85,120,共12页
With the development of maritime informatization and the increased generation of marine data,the demands of efficient and reliable maritime communication surge.However,harsh and dynamic marine communication environmen... With the development of maritime informatization and the increased generation of marine data,the demands of efficient and reliable maritime communication surge.However,harsh and dynamic marine communication environmentcan distort transmission signal,which significantly weaken the communication performance.Therefore,for maritime wireless communication system,the channel estimation is often required to detect the channel suffered from the impacts of changing factors.Since there is no universal maritime communication channel model and channel varies dynamically,channel estimation method needs to make decision dynamically without pre-knowledge of channel distribution.This paper studies the radio channel estimation problem of wireless communications over the sea surface.To improve the estimation accuracy,this paper utilizes multi-armed bandit(MAB)problem to deal with the uncertainty of channel state information(CSI),then proposes a dynamic channel estimation algorithm to explore the global changing channel information,and asymptotically minimize the estimation error.By the aid of MAB,the estimation is not only dynamic according to channel variation,but also does not need to know the channel distribution.Simulation results show that the proposed algorithm can achieve higher estimation accuracy compared to matching pursuit(MP)-based and fractional Fourier transform(FrFT)-based methods. 展开更多
关键词 MARITIME WIRELESS COMMUNICATIONS channel estimation multi-armed bandit
原文传递
上一页 1 2 下一页 到第
使用帮助 返回顶部