基于多线程并行强化学习的建筑节能方法被引量：1

Building Energy Efficiency Method Based on Multi-Thread Parallel Reinforcement Learning

下载PDF

导出

摘要提出一种基于并行强化学习的建筑节能方法,该方法结合多线程技术和经验回放方法提出一个多线程并行强化学习算法框架,其新颖点在于:在经验回放过程中引入自模拟度量方法,通过计算样本之间的距离,选取低相似度的样本构造多样样本池,Agent的学习过程从多样样本池中选取样本学习,可有效避免浪费学习资源。实验包括在仿真房间模型上与Q-Learning算法的对比实验和与经典PID控制方法的对比实验。实验结果表明,所提出的并行算法有更快的学习速率和收敛速度,能更快地求解出最优策略,并拥有更高的运行效率。 This paper proposes a method of building energy conservation based on parallel reinforcement learning, which combines a multi-threading technique and the experiment replay method to propose a multi-threading parallel reinforcement learning algorithm framework. Its novelty lies in that it introduces the self-simulation metric method in the experience replay process, which selects samples with low similarity by calculating the distance between samples to construct multiple sample pool. And in the learning process, Agent selects samples from the multiple sample pool to effectively avoid wasting learning resources. The experiments include the comparison experiments with the Q-Learning algorithm and the PID method on the simulation room model. Experimental results show that the proposed parallel algorithm has faster learning rate and convergence rate, and it can solve the optimal policy faster with higher operating efficiency.

作者陈建平康怡怡胡龄爻陆悠吴宏杰傅启明 CHEN Jianping;KANG Yiyi;HU Lingyao;LU You;WU Hongjie;FU Qiming(College of Electronics and Information Engineering,Suzhou University of Science and Technology,Suzhou,Jiangsu215009,China;Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China;Suzhou Key Laboratory of Mobile Network Technology and Application,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China)

机构地区苏州科技大学电子与信息工程学院苏州科技大学江苏省建筑智慧节能重点实验室苏州科技大学苏州市移动网络技术与应用重点实验室

出处《计算机工程与应用》 CSCD 北大核心 2019年第15期219-227,共9页 Computer Engineering and Applications

基金国家自然科学基金(No.61502329,No.61772357,No.61750110519,No.61772355,No.61702055,No.61672371,No.61602334) 江苏省自然科学基金(No.BK20140283) 江苏省重点研发计划项目(No.BE2017663) 江苏省高校自然科学研究项目(No.13KJB520020) 苏州市应用基础研究计划工业部分(No.SYG201422)

关键词强化学习并行强化学习经验回放多线程技术建筑节能 reinforcement learning parallel reinforcement learning experiment replay multi-threading technology building conversation

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献3

1胡龄爻,陈建平,傅启明,胡文,倪庆文.一种面向建筑节能的强化学习自适应控制方法[J].计算机工程与应用,2017,53(21):239-246. 被引量：9
2李远成,阴培培,赵银亮.基于模糊聚类的推测多线程划分算法[J].计算机学报,2014,37(3):580-592. 被引量：19
3傅启明,刘全,尤树华,黄蔚,章晓芳.一种新的基于值函数迁移的快速Sarsa算法[J].电子学报,2014,42(11):2157-2161. 被引量：3

二级参考文献14

1Taylor J, Precup D, Panangaden P. Bounding performance los1 in approximate IVIDP homomorphisms [ A ]. Proceedings of th 22nd Annual Conference on Neural Information Processing Sys- tems[ C]. NY: Curran Associates, 2008.1660-1667.
2Sunmola F T, Wyatt J L. Model transfer for Markov decision tasks via parameter matching [ A ]. Proceedings of the 25th Workshop of the UK Planning and Scheduling Special Interest Group[ C]. Nottingham, England, 2006.17 - 24.
3Konidaris G D, Barto A G. Building portable options: skill transfer in reinforcement learning[ A]. Proceedings of the 20th International Joint Conference on Artificial Intelligence [ C ]. CA:Morgan Kaufmann Publishers,2007. 895 - 901.
4Ferrante E, Lazaric A, Restelli M. Transfer of task representa- tion in reinforcement learning using policy-based proto-value functions[ A ]. Proceedings of the 7th International Conference on Autonomous Agents and Multi-Agent Systems[ C ]. Estoril:, 2008.1329 - 1332.
5Lazaric A, Restelli M,Bonarini A. Transfer of samples in batch reinforcement learning[ A].gs of the 25th Internation- al Conference on Machine Learning [ C]. NY: ACM Press, 2008.544 - 551.
6Sorg J, Singh S. Transfer via soft homomorphisms [ A ]. Pro- ceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems[ C]. Hungary:, 2009. 741 - 748.
7Ammar H B, Taylor M, Tuyls K, Weiss G. Reinforcement learning transfer using a sparse coded inter-task mapping[ A]. Proceedings of the 9th European Workshop on Multi-agent Systems[ C]. Berlin: Springer-verlag,2012.1 - 16.
8Konidaris G D, Scheidwasser I and Barto A G. Transfer in Re- inforcement Learning via Shared Features [ J ]. Journal of Ma- chine Learning Research,2012,13:1333- 1371.
9Sutton R S, Barto A G. Reinforcement Learning [ M ]. Cam- bridge: MIT Press, 1998.
10Givan R, Dean T, Greig M. Equivalence notions and model minimization in Markov decision processes [ J ]. Artificial In- telligence, 2003,147( 1 - 2) : 163 - 223.