摘要
针对现有合作学习算法存在频繁通信、能量消耗过大等问题,应用目标跟踪建立任务模型,文章提出一种基于Q学习和TD误差(Q-learning and TD error,QT)的传感器节点任务调度算法。具体包括将传感器节点任务调度问题映射成Q学习可解决的学习问题,建立邻居节点间的协作机制以及定义延迟回报、状态空间等基本学习元素。在协作机制中,QT使得传感器节点利用个体和群体的TD误差,通过动态改变自身的学习速度来平衡自身利益和群体利益。此外,QT根据Metropolis准则提高节点学习前期的探索概率,优化任务选择。实验结果表明:QT具备根据当前环境进行动态调度任务的能力;相比其他任务调度算法,QT消耗合理的能量使得单位性能提高了17.26%。
In order to solve the problems like frequent communication and large energy consumption in existing cooperative learning algorithms, a Q-learning and TD error(QT) based task scheduling algorithm for sensor nodes is proposed with the task model of target tracking applications. Specifically, the task scheduling problem for sensor nodes is mapped to the learning problem solved by the Q-learn- ing, and the collaboration mechanism between neighbour nodes is established. QT also defines some basic learning elements such as delayed reward and state space. The collaboration mechanism based on individual and group TD errors can allow each sensor node to balance its own interests and the group interests by changing learning speed dynamically. Moreover, QT increases the exploration probability of early learning stage based on Metropolis criterion to optimize the task selecting process. The exper- imental results show that QT has the ability to schedule its tasks dynamically according to current environments, and compared with other task scheduling algorithms, QT improves the unit performance by 17.26% with reasonable energy consumption.
出处
《合肥工业大学学报(自然科学版)》
CAS
北大核心
2017年第4期470-475,521,共7页
Journal of Hefei University of Technology:Natural Science
基金
国家自然科学基金资助项目(61370088
61502142)
国家国际科技合作专项资助项目(2014DFB10060)