期刊文献+
共找到33篇文章
< 1 2 >
每页显示 20 50 100
Starlet:Network defense resource allocation with multi-armed bandits for cloud-edge crowd sensing in IoT
1
作者 Hui Xia Ning Huang +2 位作者 Xuecai Feng Rui Zhang Chao Liu 《Digital Communications and Networks》 SCIE CSCD 2024年第3期586-596,共11页
The cloud platform has limited defense resources to fully protect the edge servers used to process crowd sensing data in Internet of Things.To guarantee the network's overall security,we present a network defense ... The cloud platform has limited defense resources to fully protect the edge servers used to process crowd sensing data in Internet of Things.To guarantee the network's overall security,we present a network defense resource allocation with multi-armed bandits to maximize the network's overall benefit.Firstly,we propose the method for dynamic setting of node defense resource thresholds to obtain the defender(attacker)benefit function of edge servers(nodes)and distribution.Secondly,we design a defense resource sharing mechanism for neighboring nodes to obtain the defense capability of nodes.Subsequently,we use the decomposability and Lipschitz conti-nuity of the defender's total expected utility to reduce the difference between the utility's discrete and continuous arms and analyze the difference theoretically.Finally,experimental results show that the method maximizes the defender's total expected utility and reduces the difference between the discrete and continuous arms of the utility. 展开更多
关键词 Internet of things Defense resource sharing multi-armed bandits Defense resource allocation
下载PDF
Distributed Weighted Data Aggregation Algorithm in End-to-Edge Communication Networks Based on Multi-armed Bandit 被引量:1
2
作者 Yifei ZOU Senmao QI +1 位作者 Cong'an XU Dongxiao YU 《计算机科学》 CSCD 北大核心 2023年第2期13-22,共10页
As a combination of edge computing and artificial intelligence,edge intelligence has become a promising technique and provided its users with a series of fast,precise,and customized services.In edge intelligence,when ... As a combination of edge computing and artificial intelligence,edge intelligence has become a promising technique and provided its users with a series of fast,precise,and customized services.In edge intelligence,when learning agents are deployed on the edge side,the data aggregation from the end side to the designated edge devices is an important research topic.Considering the various importance of end devices,this paper studies the weighted data aggregation problem in a single hop end-to-edge communication network.Firstly,to make sure all the end devices with various weights are fairly treated in data aggregation,a distributed end-to-edge cooperative scheme is proposed.Then,to handle the massive contention on the wireless channel caused by end devices,a multi-armed bandit(MAB)algorithm is designed to help the end devices find their most appropriate update rates.Diffe-rent from the traditional data aggregation works,combining the MAB enables our algorithm a higher efficiency in data aggregation.With a theoretical analysis,we show that the efficiency of our algorithm is asymptotically optimal.Comparative experiments with previous works are also conducted to show the strength of our algorithm. 展开更多
关键词 Weighted data aggregation End-to-edge communication multi-armed bandit Edge intelligence
下载PDF
融合协同过滤的神经Bandits推荐算法 被引量:2
3
作者 张婷婷 欧阳丹彤 +1 位作者 孙成林 白洪涛 《吉林大学学报(理学版)》 CAS 北大核心 2024年第1期92-99,共8页
针对数据稀疏性和“冷启动”对协同过滤的限制以及现有的协同多臂老虎机算法不适用于非线性奖励函数的问题,提出一种融合协同过滤的神经Ba ndits推荐算法COEENet.首先,采用双神经网络结构学习预期奖励及潜在增益;其次,考虑邻居协同作用... 针对数据稀疏性和“冷启动”对协同过滤的限制以及现有的协同多臂老虎机算法不适用于非线性奖励函数的问题,提出一种融合协同过滤的神经Ba ndits推荐算法COEENet.首先,采用双神经网络结构学习预期奖励及潜在增益;其次,考虑邻居协同作用;最后,构造决策器进行最终决策.实验结果表明,该方法在累积遗憾上优于4种基线算法,推荐效果较好. 展开更多
关键词 协同过滤 多臂老虎机算法 推荐系统 冷启动
下载PDF
Stochastic programming based multi-arm bandit offloading strategy for internet of things
4
作者 Bin Cao Tingyong Wu Xiang Bai 《Digital Communications and Networks》 SCIE CSCD 2023年第5期1200-1211,共12页
In order to solve the high latency of traditional cloud computing and the processing capacity limitation of Internet of Things(IoT)users,Multi-access Edge Computing(MEC)migrates computing and storage capabilities from... In order to solve the high latency of traditional cloud computing and the processing capacity limitation of Internet of Things(IoT)users,Multi-access Edge Computing(MEC)migrates computing and storage capabilities from the remote data center to the edge of network,providing users with computation services quickly and directly.In this paper,we investigate the impact of the randomness caused by the movement of the IoT user on decision-making for offloading,where the connection between the IoT user and the MEC servers is uncertain.This uncertainty would be the main obstacle to assign the task accurately.Consequently,if the assigned task cannot match well with the real connection time,a migration(connection time is not enough to process)would be caused.In order to address the impact of this uncertainty,we formulate the offloading decision as an optimization problem considering the transmission,computation and migration.With the help of Stochastic Programming(SP),we use the posteriori recourse to compensate for inaccurate predictions.Meanwhile,in heterogeneous networks,considering multiple candidate MEC servers could be selected simultaneously due to overlapping,we also introduce the Multi-Arm Bandit(MAB)theory for MEC selection.The extensive simulations validate the improvement and effectiveness of the proposed SP-based Multi-arm bandit Method(SMM)for offloading in terms of reward,cost,energy consumption and delay.The results showthat SMMcan achieve about 20%improvement compared with the traditional offloading method that does not consider the randomness,and it also outperforms the existing SP/MAB based method for offloading. 展开更多
关键词 Multi-access computing Internet of things OFFLOADING Stochastic programming multi-arm bandit
下载PDF
Strict greedy design paradigm applied to the stochastic multi-armed bandit problem
5
作者 Joey Hong 《机床与液压》 北大核心 2015年第6期1-6,共6页
The process of making decisions is something humans do inherently and routinely,to the extent that it appears commonplace. However,in order to achieve good overall performance,decisions must take into account both the... The process of making decisions is something humans do inherently and routinely,to the extent that it appears commonplace. However,in order to achieve good overall performance,decisions must take into account both the outcomes of past decisions and opportunities of future ones. Reinforcement learning,which is fundamental to sequential decision-making,consists of the following components: 1 A set of decisions epochs; 2 A set of environment states; 3 A set of available actions to transition states; 4 State-action dependent immediate rewards for each action.At each decision,the environment state provides the decision maker with a set of available actions from which to choose. As a result of selecting a particular action in the state,the environment generates an immediate reward for the decision maker and shifts to a different state and decision. The ultimate goal for the decision maker is to maximize the total reward after a sequence of time steps.This paper will focus on an archetypal example of reinforcement learning,the stochastic multi-armed bandit problem. After introducing the dilemma,I will briefly cover the most common methods used to solve it,namely the UCB and εn- greedy algorithms. I will also introduce my own greedy implementation,the strict-greedy algorithm,which more tightly follows the greedy pattern in algorithm design,and show that it runs comparably to the two accepted algorithms. 展开更多
关键词 Greedy algorithms Allocation strategy Stochastic multi-armed bandit problem
下载PDF
Training a Quantum Neural Network to Solve the Contextual Multi-Armed Bandit Problem
6
作者 Wei Hu James Hu 《Natural Science》 2019年第1期17-27,共11页
Artificial intelligence has permeated all aspects of our lives today. However, to make AI behave like real AI, the critical bottleneck lies in the speed of computing. Quantum computers employ the peculiar and unique p... Artificial intelligence has permeated all aspects of our lives today. However, to make AI behave like real AI, the critical bottleneck lies in the speed of computing. Quantum computers employ the peculiar and unique properties of quantum states such as superposition, entanglement, and interference to process information in ways that classical computers cannot. As a new paradigm of computation, quantum computers are capable of performing tasks intractable for classical processors, thus providing a quantum leap in AI research and making the development of real AI a possibility. In this regard, quantum machine learning not only enhances the classical machine learning approach but more importantly it provides an avenue to explore new machine learning models that have no classical counterparts. The qubit-based quantum computers cannot naturally represent the continuous variables commonly used in machine learning, since the measurement outputs of qubit-based circuits are generally discrete. Therefore, a continuous-variable (CV) quantum architecture based on a photonic quantum computing model is selected for our study. In this work, we employ machine learning and optimization to create photonic quantum circuits that can solve the contextual multi-armed bandit problem, a problem in the domain of reinforcement learning, which demonstrates that quantum reinforcement learning algorithms can be learned by a quantum device. 展开更多
关键词 Continuous-Variable QUANTUM COMPUTERS QUANTUM Machine LEARNING QUANTUM Reinforcement LEARNING CONTEXTUAL multi-armed bandit PROBLEM
下载PDF
基于Bandit反馈的自适应量化分布式在线镜像下降算法
7
作者 谢俊如 高文华 谢奕彬 《控制理论与应用》 EI CAS CSCD 北大核心 2023年第10期1774-1782,共9页
多智能体系统的在线分布式优化常用于处理动态环境下的优化问题,节点间需要实时传输数据流.在很多情况下,各节点无法获取个体目标函数的全部信息(包括梯度信息),并且节点间信息传输存在一定的通信约束.考虑到非欧投影意义下的镜像下降... 多智能体系统的在线分布式优化常用于处理动态环境下的优化问题,节点间需要实时传输数据流.在很多情况下,各节点无法获取个体目标函数的全部信息(包括梯度信息),并且节点间信息传输存在一定的通信约束.考虑到非欧投影意义下的镜像下降算法在处理高维数据和大规模在线学习上的优势,本文使用个体目标函数在两点处的函数值信息对缺失的梯度信息进行估计,并且根据镜像下降算法的性质设计自适应量化器,提出基于Bandit反馈的自适应量化分布式在线镜像下降算法.然后分析了量化误差界和Regret界的关系,适当选择参数可得所提算法的Regret界为O(√T).最后,通过数值仿真验证了算法和理论结果的有效性. 展开更多
关键词 镜像下降算法 多智能体系统 优化 量化 bandit反馈
下载PDF
面向不平衡类的联邦学习客户端智能选择算法
8
作者 朱素霞 王云梦 +1 位作者 颜培森 孙广路 《哈尔滨理工大学学报》 CAS 北大核心 2024年第2期33-42,共10页
在联邦学习应用场景下,若客户端设备之间的数据呈现非独立同分布特征,甚至出现类不平衡的情况时,客户端本地模型的优化目标将偏离全局优化目标,从而给全局模型的性能带来巨大挑战。为解决这种数据异质性带来的挑战,通过积极选择合适的... 在联邦学习应用场景下,若客户端设备之间的数据呈现非独立同分布特征,甚至出现类不平衡的情况时,客户端本地模型的优化目标将偏离全局优化目标,从而给全局模型的性能带来巨大挑战。为解决这种数据异质性带来的挑战,通过积极选择合适的客户端子集以平衡数据分布将有助于提高模型的性能。因此,设计了一种面向不平衡类的联邦学习客户端智能选择算法—FedSIMT。该算法不借助任何辅助数据集,在保证客户端本地数据对服务器端不可见的隐私前提下,使用Tanimoto系数度量本地数据分布与目标分布之间的差异,采用强化学习领域中的组合多臂老虎机模型平衡客户端设备选择的开发和探索,在不同数据异质性类型下提高了全局模型的准确率和收敛速度。实验结果表明,该算法具有有效性。 展开更多
关键词 联邦学习 类不平衡 客户端选择算法 多臂老虎机
下载PDF
利用Bandit算法解决推荐系统E&E问题 被引量:1
9
作者 高海宾 《韶关学院学报》 2017年第9期22-26,共5页
当前推荐系统开发应用过程中普遍存在着E&E问题,笔者指出了推荐系统中E&E问题的产生和分类,提出用Bandit算法解决这一问题的思路,重点探讨Bandit算法的数学模型和用UCB策略建立的Bandit算法模型,用MATLAB编写了核心仿真程序,并... 当前推荐系统开发应用过程中普遍存在着E&E问题,笔者指出了推荐系统中E&E问题的产生和分类,提出用Bandit算法解决这一问题的思路,重点探讨Bandit算法的数学模型和用UCB策略建立的Bandit算法模型,用MATLAB编写了核心仿真程序,并指出了这种算法模型存在的优点和不足. 展开更多
关键词 bandit算法 推荐系统 E&E问题
下载PDF
Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs
10
作者 Yifan Lin Yuhao Wang Enlu Zhou 《Journal of Systems Science and Systems Engineering》 SCIE EI CSCD 2023年第3期267-288,共22页
In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion.At each round,contexts are revealed for each arm,and the decision maker chooses one arm to pull and ... In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion.At each round,contexts are revealed for each arm,and the decision maker chooses one arm to pull and receives the corresponding reward.In particular,we consider mean-variance as the risk criterion,and the best arm is the one with the largest mean-variance reward.We apply the Thompson sampling algorithm for the disjoint model,and provide a comprehensive regret analysis for a variant of the proposed algorithm.For T rounds,K actions,and d-dimensional feature vectors,we prove a regret bound of O((1+ρ+1/ρ)d In T ln K/δ√dKT^(1+2∈)ln K/δ1/e)that holds with probability 1-δunder the mean-variance criterion with risk tolerance p,for any 0<ε<1/2,0<δ<1.The empirical performance of our proposed algorithms is demonstrated via a portfolio selection problem. 展开更多
关键词 multi-armed bandit CONTEXT RISK-AVERSE Thompson sampling
原文传递
融合用户聚类与Bandits算法的微博推荐模型
11
作者 何羽丰 徐建民 张彬 《小型微型计算机系统》 CSCD 北大核心 2022年第10期2122-2130,共9页
针对微博推荐系统中存在的新用户冷启动和数据稀疏性问题,提出一种微博推荐模型.该模型通过重要用户聚类和普通用户分类构建完整用户类,基于类兴趣表征普通用户兴趣,利用Bandits算法为完整用户类中的普通用户产生微博推荐列表,根据普通... 针对微博推荐系统中存在的新用户冷启动和数据稀疏性问题,提出一种微博推荐模型.该模型通过重要用户聚类和普通用户分类构建完整用户类,基于类兴趣表征普通用户兴趣,利用Bandits算法为完整用户类中的普通用户产生微博推荐列表,根据普通用户对推荐列表的反馈更新其所属完整用户类的历史数据,合理应对新用户冷启动,降低了数据稀疏度,实现了较为准确的微博推荐,为微博推荐模型的构建提供了新的思路.实验结果表明,该模型能够推荐给用户感兴趣的博文,推荐效果较现有随机探索类算法、置信区间类算法和概率匹配类算法分别最低提高5.62%、5.43%和33.37%. 展开更多
关键词 微博推荐 用户聚类 bandits算法 冷启动 数据稀疏
下载PDF
基于Bandit反馈的在线分布式镜面下降算法
12
作者 朱小梅 李觉友 《西南大学学报(自然科学版)》 CAS CSCD 北大核心 2022年第1期99-107,共9页
针对在线分布式优化中一类损失函数梯度信息获取困难的问题,提出一种基于Bandit反馈的在线分布式镜面下降(ODMD-B)算法.首先,推广在线分布式镜面梯度下降(ODMD)算法到免梯度的情形,提出了一种新的仅利用函数值信息来对梯度进行估计的方... 针对在线分布式优化中一类损失函数梯度信息获取困难的问题,提出一种基于Bandit反馈的在线分布式镜面下降(ODMD-B)算法.首先,推广在线分布式镜面梯度下降(ODMD)算法到免梯度的情形,提出了一种新的仅利用函数值信息来对梯度进行估计的方法即Bandit反馈,其关键在于利用损失函数值信息逼近梯度信息,能有效克服梯度信息难以获取或计算复杂的困难.然后,给出算法的收敛性分析.结果表明算法的收敛速度为O(T),其中T是迭代次数.最后,使用投资组合选择模型进行了数值仿真实验.实验结果表明,ODMD-B算法的收敛速度与已有的ODMD算法的收敛速度接近.对比ODMD算法,本文所提出算法的优点在于仅仅使用了计算花费较小的函数值信息,使其更适用于梯度信息难以获取的优化问题. 展开更多
关键词 在线学习 分布式优化 镜面下降算法 bandit反馈 Regret界
下载PDF
基于Bandit反馈的分布式在线对偶平均算法
13
作者 朱小梅 《四川轻化工大学学报(自然科学版)》 CAS 2020年第3期87-93,共7页
为解决梯度信息难以获取的分布式在线优化问题,提出了一种基于Bandit反馈的分布式在线对偶平均(DODA-B)算法。首先,该算法对原始梯度信息反馈进行了改进,提出了一种新的梯度估计,即Bandit反馈,利用函数值信息去近似原损失函数的梯度信息... 为解决梯度信息难以获取的分布式在线优化问题,提出了一种基于Bandit反馈的分布式在线对偶平均(DODA-B)算法。首先,该算法对原始梯度信息反馈进行了改进,提出了一种新的梯度估计,即Bandit反馈,利用函数值信息去近似原损失函数的梯度信息,克服了求解复杂函数梯度存在的计算量大等问题。然后,给出了该算法的收敛性分析,结果表明,Regret界的收敛速度为O(Tmax{k,1-k}),其中T是最大迭代次数。最后,利用传感器网络的一个特例进行了数值模拟计算,计算结果表明,所提算法的收敛速度与现有的分布式在线对偶平均(DODA)算法的收敛速度接近。与DODA算法相比,所提出算法的优点在于只考虑了函数值信息,使其更适用于梯度信息获取困难的实际问题。 展开更多
关键词 分布式在线优化 对偶平均算法 bandit反馈 Regret界
下载PDF
Matching while Learning: Wireless Scheduling for Age of Information Optimization at the Edge 被引量:2
14
作者 Kun Guo Hao Yang +2 位作者 Peng Yang Wei Feng Tony Q.S.Quek 《China Communications》 SCIE CSCD 2023年第3期347-360,共14页
In this paper,we investigate the minimization of age of information(AoI),a metric that measures the information freshness,at the network edge with unreliable wireless communications.Particularly,we consider a set of u... In this paper,we investigate the minimization of age of information(AoI),a metric that measures the information freshness,at the network edge with unreliable wireless communications.Particularly,we consider a set of users transmitting status updates,which are collected by the user randomly over time,to an edge server through unreliable orthogonal channels.It begs a natural question:with random status update arrivals and obscure channel conditions,can we devise an intelligent scheduling policy that matches the users and channels to stabilize the queues of all users while minimizing the average AoI?To give an adequate answer,we define a bipartite graph and formulate a dynamic edge activation problem with stability constraints.Then,we propose an online matching while learning algorithm(MatL)and discuss its implementation for wireless scheduling.Finally,simulation results demonstrate that the MatL is reliable to learn the channel states and manage the users’buffers for fresher information at the edge. 展开更多
关键词 information freshness Lyapunov opti-mization multi-armed bandit wireless scheduling
下载PDF
Residential HVAC Aggregation Based on Risk-averse Multi-armed Bandit Learning for Secondary Frequency Regulation 被引量:7
15
作者 Xinyi Chen Qinran Hu +3 位作者 Qingxin Shi Xiangjun Quan Zaijun Wu Fangxing Li 《Journal of Modern Power Systems and Clean Energy》 SCIE EI CSCD 2020年第6期1160-1167,共8页
As the penetration of renewable energy continues to increase,stochastic and intermittent generation resources gradually replace the conventional generators,bringing significant challenges in stabilizing power system f... As the penetration of renewable energy continues to increase,stochastic and intermittent generation resources gradually replace the conventional generators,bringing significant challenges in stabilizing power system frequency.Thus,aggregating demand-side resources for frequency regulation attracts attentions from both academia and industry.However,in practice,conventional aggregation approaches suffer from random and uncertain behaviors of the users such as opting out control signals.The risk-averse multi-armed bandit learning approach is adopted to learn the behaviors of the users and a novel aggregation strategy is developed for residential heating,ventilation,and air conditioning(HVAC)to provide reliable secondary frequency regulation.Compared with the conventional approach,the simulation results show that the risk-averse multiarmed bandit learning approach performs better in secondary frequency regulation with fewer users being selected and opting out of the control.Besides,the proposed approach is more robust to random and changing behaviors of the users. 展开更多
关键词 HEATING ventilation and air conditioning(HVAC) load control multi-armed bandit online learning secondary frequency regulation
原文传递
Channel estimation based on multi-armed approach for maritime OFDM wireless communications
16
作者 Zhang Qianqian Xu Yanli 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2023年第4期75-85,120,共12页
With the development of maritime informatization and the increased generation of marine data,the demands of efficient and reliable maritime communication surge.However,harsh and dynamic marine communication environmen... With the development of maritime informatization and the increased generation of marine data,the demands of efficient and reliable maritime communication surge.However,harsh and dynamic marine communication environmentcan distort transmission signal,which significantly weaken the communication performance.Therefore,for maritime wireless communication system,the channel estimation is often required to detect the channel suffered from the impacts of changing factors.Since there is no universal maritime communication channel model and channel varies dynamically,channel estimation method needs to make decision dynamically without pre-knowledge of channel distribution.This paper studies the radio channel estimation problem of wireless communications over the sea surface.To improve the estimation accuracy,this paper utilizes multi-armed bandit(MAB)problem to deal with the uncertainty of channel state information(CSI),then proposes a dynamic channel estimation algorithm to explore the global changing channel information,and asymptotically minimize the estimation error.By the aid of MAB,the estimation is not only dynamic according to channel variation,but also does not need to know the channel distribution.Simulation results show that the proposed algorithm can achieve higher estimation accuracy compared to matching pursuit(MP)-based and fractional Fourier transform(FrFT)-based methods. 展开更多
关键词 MARITIME WIRELESS COMMUNICATIONS channel estimation multi-armed bandit
原文传递
感知器学习算法研究 被引量:8
17
作者 刘建伟 申芳林 罗雄麟 《计算机工程》 CAS CSCD 北大核心 2010年第7期190-192,共3页
介绍感知器学习算法及其变种,给出各种感知器算法的伪代码,指出各种算法的优点。给出感知器算法在线性可分和线性不可分情况下的误差界定理,讨论各种感知器学习算法的误差界理论,给出各种算法的误差界。介绍感知器学习算法在在线优化场... 介绍感知器学习算法及其变种,给出各种感知器算法的伪代码,指出各种算法的优点。给出感知器算法在线性可分和线性不可分情况下的误差界定理,讨论各种感知器学习算法的误差界理论,给出各种算法的误差界。介绍感知器学习算法在在线优化场景、强化学习场景和赌博机算法中的应用,并对未解决的问题进行讨论。 展开更多
关键词 感知器 错误界 赌博机算法 强化学习
下载PDF
基于强化学习的WLAN节点自适应调整技术 被引量:2
18
作者 陈禹 赵静雅 +1 位作者 朱庆华 刘涌 《计算机工程与设计》 北大核心 2019年第9期2422-2427,共6页
为提高WLAN无线网络节点的吞吐量,针对IEEE802.11无线网络中的速率自适应技术进行研究。基于一定程度的强化学习策略,提出多臂老虎机问题的速率自适应算法。与已提出的一些算法在信道稳定的情况下、信道逐渐变化的情况下、信道随机变化... 为提高WLAN无线网络节点的吞吐量,针对IEEE802.11无线网络中的速率自适应技术进行研究。基于一定程度的强化学习策略,提出多臂老虎机问题的速率自适应算法。与已提出的一些算法在信道稳定的情况下、信道逐渐变化的情况下、信道随机变化的情况下依次进行对比,观察它们的吞吐量变化情况,分析所提出算法性能的优劣性。仿真结果表明,当信道环境稳定或比较稳定时,该算法要优于其它自适应算法。 展开更多
关键词 WLAN无线局域网 速率自适应算法 强化学习 多臂老虎机 IEEE802.11
下载PDF
认知无线网络中基于无休止多臂赌博机模型的多用户频谱接入机制 被引量:5
19
作者 朱江 韩超 +1 位作者 杨浩磊 彭著勋 《计算机应用》 CSCD 北大核心 2014年第10期2782-2786,共5页
针对如何协调多个认知用户择机接入多段空闲频域信道的问题,提出了一种基于无休止多臂赌博机(RMAB)模型的动态频谱接入机制。首先,考虑到实际环境下认知用户的信道感知误差,推导出能有效处理感知误差的Whittle索引值算法,该算法通过历... 针对如何协调多个认知用户择机接入多段空闲频域信道的问题,提出了一种基于无休止多臂赌博机(RMAB)模型的动态频谱接入机制。首先,考虑到实际环境下认知用户的信道感知误差,推导出能有效处理感知误差的Whittle索引值算法,该算法通过历史经验积累给予每个信道一定的信任值,并综合考虑在当前信任值下选择每个信道的立即收益与未来收益的多少,选择出需要感知接入的信道;其次,对于多个认知用户接入相同信道时产生冲突的问题,提出了基于多标拍卖的协调机制,通过多标拍卖的方式处理认知用户之间的冲突。仿真结果表明,在相同的环境中,所提出的频谱接入机制与未处理误差的或者未采用多标拍卖的接入机制相比,认知用户获得的吞吐量更大。 展开更多
关键词 多用户多信道 无休止多臂赌博机模型 多标拍卖 Whittle索引值算法
下载PDF
针对新用户冷启动问题的改进Epsilon-greedy算法 被引量:1
20
作者 王素琴 张洋 +1 位作者 蒋浩 朱登明 《计算机工程》 CAS CSCD 北大核心 2018年第11期172-177,共6页
在解决新用户冷启动问题时,固定不变的Epsilon参数会使传统Epsilon-greedy算法收敛缓慢。为此,提出一种改进的Epsilon-greedy算法。利用免疫反馈模型动态调整Epsilon参数,从而使算法快速收敛。使用蒙特卡罗模拟方法对算法进行实验验证,... 在解决新用户冷启动问题时,固定不变的Epsilon参数会使传统Epsilon-greedy算法收敛缓慢。为此,提出一种改进的Epsilon-greedy算法。利用免疫反馈模型动态调整Epsilon参数,从而使算法快速收敛。使用蒙特卡罗模拟方法对算法进行实验验证,结果表明,该算法能够在用户与推荐系统交互较少的情况下为用户进行有效推荐,且推荐效果优于传统的Epsilon-greedy、Softmax和UCB算法。 展开更多
关键词 推荐系统 冷启动 Epsilon-greedy算法 免疫反馈模型 bandit算法
下载PDF
上一页 1 2 下一页 到第
使用帮助 返回顶部