期刊文献+

基于强化学习的稀疏群智感知参与者招募策略

Direct Participant Recruitment Strategy in Sparse Mobile Crowdsensing
下载PDF
导出
摘要 稀疏群智感知通过选择部分子区域进行数据采集并推测其他子区域的数据,在保证全局数据质量的同时节省了感知成本.然而现有的稀疏群智感知研究工作总是优先选择具有较高价值的子区域,没有考虑被招募的参与者是否能够采集到所需子区域的数据,也忽略了参与者采集的其它数据价值.为了解决子区域选择具有的局限性,本文从参与者的角度出发,提出了应对稀疏群智感知下的用户招募这一问题的新思路,考虑每个参与者采集到的数据对整个采集任务的贡献程度.鉴于每人每天的移动轨迹基本稳定,而不同人在其各自轨迹上采集的数据具有不同的价值,本文利用这种规律性和差异性,研究如何直接招募可采集到高价值数据的参与者.我们采用强化学习框架解决该问题,将用户招募系统作为强化学习的智能体,并且对招募系统的状态、动作和奖励进行建模.本文中使用深度强化学习算法Deep Q Network(DQN)来训练回报函数,旨在给出在特定的状态下,判断招募哪些用户是最好的选择.该框架在北京市两个月空气质量和一百多名用户移动轨迹的真实数据集上进行了验证,所提出的用户招募策略相比若干基准策略,在用户数量限定下,可获得更高的数据推测精度. Sparse Mobile Crowdsensing(Sparse MCS)selects a small part of sub-areas for data collection and infers the data of other sub-areas from the collected data.Compared with Mobile Crowdsensing(MCS)that does not use data inference methods,Sparse MCS saves sensing costs while ensuring the quality of global data.However,the existing research works on Sparse MCS only focus on selecting a small part of sub-areas with higher value.It does not consider whether the recruited participants can collect the data of the required sub-areas,and also ignores the value of other data collected by the participants.In order to solve the limitations of traditional methods in sub-areas selection,this paper starts from the perspective of participants and concentrates on the contribution of the data collected by each participant to the entire collection task.All the data contributions collected by each participant will become the basis for decision-making for the participant’s choice.And correspondingly,a new idea to deal with the problem of participant selection under Sparse MCS is proposed.In view of the fact that each person’s daily movement trajectory is basically stable,and the data collected by different people on their respective trajectories have different values,this paper uses this regularity and difference to study how to directly recruit participants who can collect high-value data.Furthermore,the participant selection problem considered in this paper is not limited to the data collection in the next cycle,but directly recruits some participants to continue the data collection task in the next multiple cycles.The participant selection problem that spans multiple cycles can be modeled as a dynamic decision-making problem.Since heuristic strategies may fall into a local optimal solution,this paper uses reinforcement learning to solve the participant selection problem:We use the participant selection system as an agent of reinforcement learning,and design the state,action and reward of the reinforcement learning model in detail.Factors such as historical selection participant status,sub-areas data collection status and date are considered in the state.The user number is regarded as an action in reinforcement learning,and the reward is reflected by the final data inference error.In order to avoid the problem of excessive number of actions,this paper sets the action to select only one participant at a time until the maximum number of participants is reached,instead of selecting a group of participants at a time.This paper will discuss in detail the difference between the two action modes.To deal with the explosion of the number of state spaces,we use deep reinforcement learning algorithm Deep Q Network(DQN)to train the Q-function,aiming to give the best choice for judging which participants to recruit in a specific state.This framework was verified on a real data set of air quality in Beijing for two months and the movement trajectories of more than one hundred users.Compared with several baseline policies,our proposed participant recruitment strategy can achieve higher data estimation accuracy under a limited number of users.
作者 涂淳钰 於志勇 韩磊 朱伟平 黄昉菀 郭文忠 王乐业 TU Chun-Yu;YU Zhi-Yong;HAN Lei;ZHU Wei-Ping;HUANG Fang-Wan;GUO Wen-Zhong;WANG Le-Ye(College of Mathematics and Computer Science,Fuzhou University,Fuzhou 350108;Department of Fujian Key Laboratory of Network Computing and Intelligent Information Processing,Fuzhou University,Fuzhou 350108;School of Computer Science,Northwestern Polytechnical University,Xi’an 710072;Key Lab of High Confidence Software Technologies,Peking University,Beijing 100871;School of Computer Science,Peking University,Beijing 100871)
出处 《计算机学报》 EI CAS CSCD 北大核心 2022年第7期1539-1556,共18页 Chinese Journal of Computers
基金 国家自然科学基金(61772136,61972008) 福建省杰出青年科学基金项目(2018J07005) 福建省引导性项目(2020H0008) 福建省大数据分析与处理工程研究中心和北大百度基金资助项目(2019BD005)资助.
关键词 群智感知 用户招募 压缩感知 粒子群优化 强化学习 crowdsensing user recruitment compressive sensing particle swarm optimization reinforcement learning
  • 相关文献

参考文献3

二级参考文献17

  • 1Puterman M L.Markov Decision Process:Discrete Dynamic Dtochastic Programming.New-York:Wiley,1994
  • 2Kaya M,Alhajj R.Fuzzy olap association rules mining based modular reinforcement learning approach for multiagent systems.IEEE Transactions on Systems,Man and Cybernetics part B:Cybernetics,2005,35(2):326-338
  • 3Singh S,Bertsekas D.Reinforcement learning for dynamic channel allocation in cellular telephone systems//Mozer M C,Jordan M L,Petsche T.Proceedings of the NIPS-9.Cambridge MA:MIT Press,1997:974
  • 4Vengerov D N,Berenji H R.A fuzzy reinforcement learning approach to power control in wireless transmitters.IEEE Transactions on Systems,Man,and Cybernetics part B:Cybernetics,2005,35(4):768-778
  • 5Critesl R H,Barto A G.Elevator group control using multiple reinforcement learning Agents.Machine Learning,1998,33(2/3):235-262
  • 6Kaelbling L P,Littman M L,Moore A P.Reinforcement learning:A survey.Journal of Artificial Intelligence Research,1996,4:237-285
  • 7Sutton R S,Barto A G.Reinforcement Learning:An Introduction.Cambridge MA:MIT Press,1998
  • 8Schwartz A.A reinforcement learning method for maximizing undiscounted rewards//Huns M N,Singh M P eds.Proceedings of the 10th Annual Conference on Machine Learning.San Francisco:Morgan Kaufmann,1993:298-305
  • 9Tadepalli P,Ok D.Model-based average reward reinforcement learning.Artificial Intelligence,1998,100(1/2):177-224
  • 10Gosavi A.Reinforcement learning for long run average cost.European Journal of Operational Research,2004,155 (3):654-674

共引文献67

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部