基于强化学习的稀疏群智感知参与者招募策略

Direct Participant Recruitment Strategy in Sparse Mobile Crowdsensing

下载PDF

导出

摘要稀疏群智感知通过选择部分子区域进行数据采集并推测其他子区域的数据,在保证全局数据质量的同时节省了感知成本.然而现有的稀疏群智感知研究工作总是优先选择具有较高价值的子区域,没有考虑被招募的参与者是否能够采集到所需子区域的数据,也忽略了参与者采集的其它数据价值.为了解决子区域选择具有的局限性,本文从参与者的角度出发,提出了应对稀疏群智感知下的用户招募这一问题的新思路,考虑每个参与者采集到的数据对整个采集任务的贡献程度.鉴于每人每天的移动轨迹基本稳定,而不同人在其各自轨迹上采集的数据具有不同的价值,本文利用这种规律性和差异性,研究如何直接招募可采集到高价值数据的参与者.我们采用强化学习框架解决该问题,将用户招募系统作为强化学习的智能体,并且对招募系统的状态、动作和奖励进行建模.本文中使用深度强化学习算法Deep Q Network(DQN)来训练回报函数,旨在给出在特定的状态下,判断招募哪些用户是最好的选择.该框架在北京市两个月空气质量和一百多名用户移动轨迹的真实数据集上进行了验证,所提出的用户招募策略相比若干基准策略,在用户数量限定下,可获得更高的数据推测精度. Sparse Mobile Crowdsensing(Sparse MCS)selects a small part of sub-areas for data collection and infers the data of other sub-areas from the collected data.Compared with Mobile Crowdsensing(MCS)that does not use data inference methods,Sparse MCS saves sensing costs while ensuring the quality of global data.However,the existing research works on Sparse MCS only focus on selecting a small part of sub-areas with higher value.It does not consider whether the recruited participants can collect the data of the required sub-areas,and also ignores the value of other data collected by the participants.In order to solve the limitations of traditional methods in sub-areas selection,this paper starts from the perspective of participants and concentrates on the contribution of the data collected by each participant to the entire collection task.All the data contributions collected by each participant will become the basis for decision-making for the participant’s choice.And correspondingly,a new idea to deal with the problem of participant selection under Sparse MCS is proposed.In view of the fact that each person’s daily movement trajectory is basically stable,and the data collected by different people on their respective trajectories have different values,this paper uses this regularity and difference to study how to directly recruit participants who can collect high-value data.Furthermore,the participant selection problem considered in this paper is not limited to the data collection in the next cycle,but directly recruits some participants to continue the data collection task in the next multiple cycles.The participant selection problem that spans multiple cycles can be modeled as a dynamic decision-making problem.Since heuristic strategies may fall into a local optimal solution,this paper uses reinforcement learning to solve the participant selection problem:We use the participant selection system as an agent of reinforcement learning,and design the state,action and reward of the reinforcement learning model in detail.Factors such as historical selection participant status,sub-areas data collection status and date are considered in the state.The user number is regarded as an action in reinforcement learning,and the reward is reflected by the final data inference error.In order to avoid the problem of excessive number of actions,this paper sets the action to select only one participant at a time until the maximum number of participants is reached,instead of selecting a group of participants at a time.This paper will discuss in detail the difference between the two action modes.To deal with the explosion of the number of state spaces,we use deep reinforcement learning algorithm Deep Q Network(DQN)to train the Q-function,aiming to give the best choice for judging which participants to recruit in a specific state.This framework was verified on a real data set of air quality in Beijing for two months and the movement trajectories of more than one hundred users.Compared with several baseline policies,our proposed participant recruitment strategy can achieve higher data estimation accuracy under a limited number of users.

作者涂淳钰於志勇韩磊朱伟平黄昉菀郭文忠王乐业 TU Chun-Yu;YU Zhi-Yong;HAN Lei;ZHU Wei-Ping;HUANG Fang-Wan;GUO Wen-Zhong;WANG Le-Ye(College of Mathematics and Computer Science,Fuzhou University,Fuzhou 350108;Department of Fujian Key Laboratory of Network Computing and Intelligent Information Processing,Fuzhou University,Fuzhou 350108;School of Computer Science,Northwestern Polytechnical University,Xi’an 710072;Key Lab of High Confidence Software Technologies,Peking University,Beijing 100871;School of Computer Science,Peking University,Beijing 100871)

机构地区福州大学数学与计算机科学学院福建省网络计算与智能信息处理重点实验室(福州大学) 西北工业大学计算机学院高可信软件技术教育部重点实验室(北京大学) 北京大学计算机学院

出处《计算机学报》 EI CAS CSCD 北大核心 2022年第7期1539-1556,共18页 Chinese Journal of Computers

基金国家自然科学基金(61772136,61972008) 福建省杰出青年科学基金项目(2018J07005) 福建省引导性项目(2020H0008) 福建省大数据分析与处理工程研究中心和北大百度基金资助项目(2019BD005)资助.

关键词群智感知用户招募压缩感知粒子群优化强化学习 crowdsensing user recruitment compressive sensing particle swarm optimization reinforcement learning

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献3

1魏英姿 ,赵明扬 .一种基于强化学习的作业车间动态调度方法[J].自动化学报,2005,31(5):765-771. 被引量：19
2傅启明,刘全,王辉,肖飞,于俊,李娇.一种基于线性函数逼近的离策略Q(λ)算法[J].计算机学报,2014,37(3):677-686. 被引量：25
3高阳,周如益,王皓,曹志新.平均奖赏强化学习算法研究[J].计算机学报,2007,30(8):1372-1378. 被引量：38

二级参考文献17

1Puterman M L.Markov Decision Process:Discrete Dynamic Dtochastic Programming.New-York:Wiley,1994
2Kaya M,Alhajj R.Fuzzy olap association rules mining based modular reinforcement learning approach for multiagent systems.IEEE Transactions on Systems,Man and Cybernetics part B:Cybernetics,2005,35(2):326-338
3Singh S,Bertsekas D.Reinforcement learning for dynamic channel allocation in cellular telephone systems//Mozer M C,Jordan M L,Petsche T.Proceedings of the NIPS-9.Cambridge MA:MIT Press,1997:974
4Vengerov D N,Berenji H R.A fuzzy reinforcement learning approach to power control in wireless transmitters.IEEE Transactions on Systems,Man,and Cybernetics part B:Cybernetics,2005,35(4):768-778
5Critesl R H,Barto A G.Elevator group control using multiple reinforcement learning Agents.Machine Learning,1998,33(2/3):235-262
6Kaelbling L P,Littman M L,Moore A P.Reinforcement learning:A survey.Journal of Artificial Intelligence Research,1996,4:237-285
7Sutton R S,Barto A G.Reinforcement Learning:An Introduction.Cambridge MA:MIT Press,1998
8Schwartz A.A reinforcement learning method for maximizing undiscounted rewards//Huns M N,Singh M P eds.Proceedings of the 10th Annual Conference on Machine Learning.San Francisco:Morgan Kaufmann,1993:298-305
9Tadepalli P,Ok D.Model-based average reward reinforcement learning.Artificial Intelligence,1998,100(1/2):177-224
10Gosavi A.Reinforcement learning for long run average cost.European Journal of Operational Research,2004,155 (3):654-674

共引文献67

1Di Cao,Weihao Hu,Junbo Zhao,Guozhou Zhang,Bin Zhang,Zhou Liu,Zhe Chen,Frede Blaabjerg.Reinforcement Learning and Its Applications in Modern Power and Energy Systems: A Review[J].Journal of Modern Power Systems and Clean Energy,2020,8(6):1029-1042. 被引量：25
2曹红倩.应用改进Q-learning算法解决柔性作业车间调度问题[J].国外电子测量技术,2022,41(4):164-169. 被引量：3
3李瑾,刘全,杨旭东,杨凯,翁东良.一种改进的平均奖赏强化学习方法在RoboCup训练中的应用[J].苏州大学学报（自然科学版）,2012,28(2):21-26. 被引量：2
4张捍东,吴玉秀,岑豫皖.多机器人合作与协调研究进展[J].计算机工程与应用,2008,44(24):238-241. 被引量：4
5王巍巍,陈兴国,高阳.一种结合Tile Coding的平均奖赏强化学习算法[J].模式识别与人工智能,2008,21(4):446-452.
6王超,郭静,包振强.改进的Q学习算法在作业车间调度中的应用[J].计算机应用,2008,28(12):3268-3270. 被引量：8
7王冠军,王茂励,赵莹.基于马尔可夫决策模型的测试向量排序新方法[J].计算机科学,2010,37(5):287-290. 被引量：1
8付燕宁,张家臣,刘磊.面向预定义过程的强化学习WS组合[J].吉林大学学报（工学版）,2010,40(5):1313-1317.
9刘全,傅启明,龚声蓉,伏玉琛,崔志明.最小状态变元平均奖赏的强化学习方法[J].通信学报,2011,32(1):66-71. 被引量：15
10谢志强,滕宇峥,杨静.紧密衔接工序组联动的综合调度算法[J].自动化学报,2011,37(3):371-379. 被引量：15

1杨波,吴兴兴.造血干细胞捐献动员招募策略的研究与实践——以广西经贸职业技术学院红十字会三献工作为例[J].时代人物,2022(9):58-60.
2唐作红,蔡兰,朱定荣,吕莹晞.构建无偿献血“血站+”宣传招募策略的做法与效果[J].中国卫生产业,2022,19(4):214-217.
3夏祝天,陈小嫄,温程荣,范小伊,杨春花.目标管理结合智能展示系统对招募单采血小板献血者的应用[J].中国输血杂志,2022,35(4):423-426. 被引量：2
4欧阳转弟,梁韶刚,李茉,蒋燕,周林.佛山市顺德区无偿献血意愿现状分析及招募方法应用研究[J].智慧健康,2022,8(3):103-105. 被引量：3
5吴洁,查勇,汪媛,储晓敏,张三焕.芜湖市非亲缘外周血造血干细胞捐献志愿者招募情况调查[J].国际输血及血液学杂志,2022,45(1):74-79. 被引量：1
6孟莉,黄庆香.三门峡市2015~2019年无偿献血人群结构分布特征及招募策略探讨[J].山西卫生健康职业学院学报,2021,31(6):41-43. 被引量：1
7夏晓寒,李春燕,郑悦,詹霞华,林彩霞,冯晴.基于采供血全程信息化管理的成分血捐献者电话招募策略的实施及其应用[J].中国输血杂志,2021,34(10):1130-1133. 被引量：3
8葛让,邹爽.智能化招募服务系统用于成分献血者招募的应用调查[J].中国卫生产业,2022,19(1):155-158.
9姚维,黄俊.移动边缘计算中的无人机群协同任务卸载策略研究[J].重庆邮电大学学报（自然科学版）,2022,34(3):507-514. 被引量：9
10岳浩,吴义文,王勇,张小龙,程鑫.基于智能用采物联终端的采集任务调度管理APP的设计与实现[J].物联网技术,2022,12(7):86-89. 被引量：4

计算机学报

2022年第7期

浏览历史

内容加载中请稍等...

基于强化学习的稀疏群智感知参与者招募策略

参考文献3

二级参考文献17

共引文献67

相关作者

相关机构

相关主题

浏览历史