期刊文献+
共找到6篇文章
< 1 >
每页显示 20 50 100
强化学习原理、算法及应用 被引量:19
1
作者 黄炳强 曹广益 王占全 《河北工业大学学报》 CAS 2006年第6期34-38,共5页
强化学习(ReinforcementLearningRL)是从动物学习理论发展而来的,它不需要有先验知识,通过不断与环境交互来获得知识,自主的进行动作选择,具有自主学习能力,在自主机器人行为学习中受到广泛重视.本文综述了强化学习的基本原理,各种算法... 强化学习(ReinforcementLearningRL)是从动物学习理论发展而来的,它不需要有先验知识,通过不断与环境交互来获得知识,自主的进行动作选择,具有自主学习能力,在自主机器人行为学习中受到广泛重视.本文综述了强化学习的基本原理,各种算法,包括TD算法、Q-学习和R学习等,最后介绍了强化学习的应用及其在多机器人系统中的研究热点问题. 展开更多
关键词 强化学习 TD算法 Q-学习 r-学习
下载PDF
基于平均奖赏强化学习算法的零阶分类元系统 被引量:1
2
作者 臧兆祥 李昭 +1 位作者 王俊英 但志平 《计算机工程与应用》 CSCD 北大核心 2016年第21期14-20,48,共8页
零阶学习分类元系统ZCS(Zeroth-level Classifier System)作为一种基于遗传的机器学习技术(GeneticsBased Machine Learning),在解决多步学习问题上,已展现出应用价值。然而标准的ZCS系统采用折扣奖赏强化学习技术,难于适应更为广泛的... 零阶学习分类元系统ZCS(Zeroth-level Classifier System)作为一种基于遗传的机器学习技术(GeneticsBased Machine Learning),在解决多步学习问题上,已展现出应用价值。然而标准的ZCS系统采用折扣奖赏强化学习技术,难于适应更为广泛的应用领域。基于ZCS的现有框架,提出了一种采用平均奖赏强化学习技术(R-学习算法)的分类元系统,将ZCS中的折扣奖赏强化学习方法替换为R-学习算法,从而使ZCS一方面可应用于需要优化平均奖赏的问题领域,另一方面则可求解规模较大、需要动作长链支持的多步学习问题。实验显示,在多步学习问题中,该系统可给出满意解,且在维持动作长链,以及克服过泛化问题方面,具有更优的特性。 展开更多
关键词 平均奖赏 强化学习 r-学习算法 学习分类元系统(LCS) 零阶分类元系统(ZCS) 多步学习问题
下载PDF
Incremental Multi Step R Learning
3
作者 胡光华 吴沧浦 《Journal of Beijing Institute of Technology》 EI CAS 1999年第3期245-250,共6页
Aim To investigate the model free multi step average reward reinforcement learning algorithm. Methods By combining the R learning algorithms with the temporal difference learning (TD( λ ) learning) algorithm... Aim To investigate the model free multi step average reward reinforcement learning algorithm. Methods By combining the R learning algorithms with the temporal difference learning (TD( λ ) learning) algorithms for average reward problems, a novel incremental algorithm, called R( λ ) learning, was proposed. Results and Conclusion The proposed algorithm is a natural extension of the Q( λ) learning, the multi step discounted reward reinforcement learning algorithm, to the average reward cases. Simulation results show that the R( λ ) learning with intermediate λ values makes significant performance improvement over the simple R learning. 展开更多
关键词 reinforcement learning average reward R learning Markov decision processes temporal difference learning
下载PDF
Hierarchical annotation method for metal corrosion detection of power equipment
4
作者 Zhang Baili Cao YongZhang Pei +2 位作者 Zhang Zhao He Yina Zhong Mingjun 《Journal of Southeast University(English Edition)》 EI CAS 2021年第4期350-355,共6页
To solve the ambiguity and uncertainty in the labeling process of power equipment corrosion datasets,a novel hierarchical annotation method(HAM)is proposed.Firstly,large boxes are used to label a large area covering t... To solve the ambiguity and uncertainty in the labeling process of power equipment corrosion datasets,a novel hierarchical annotation method(HAM)is proposed.Firstly,large boxes are used to label a large area covering the range of corrosion,provided that the area is visually continuous and adjacent to corrosion that cannot be clearly divided.Secondly,in each labeling box established in the first step,regions with distinct corrosion and relative independence are labeled to form a second layer of nested boxes.Finally,a series of comparative experiments are conducted with other common annotation methods to validate the effectiveness of HAM.The experimental results show that,with the help of HAM,the recall of YOLOv5 increases from 50.79%to 59.41%;the recall of Faster R-CNN+VGG16 increases from 66.50%to 78.94%;the recall of Faster R-CNN+Res101 increases from 78.32%to 84.61%.Therefore,HAM can effectively improve the detection ability of mainstream models in detecting metal corrosion. 展开更多
关键词 deep learning Faster r-CNN YOLOv5 object detection hierarchical annotation
下载PDF
The Application of TPR method in English Classroom of primary school
5
作者 Yuan Xinhua 《International English Education Research》 2014年第8期57-60,共4页
This study based on the conclusion demonstrated in Asher's studies that display oral practice with actions brings considerable effectiveness. TPR would be an appropriate and effective teaching method that will promot... This study based on the conclusion demonstrated in Asher's studies that display oral practice with actions brings considerable effectiveness. TPR would be an appropriate and effective teaching method that will promote acquisition of comprehensible input in a natural way; it is a good way to learn a second language, not just for children, but also for adults as well. At the same time, it's a great helper to the teachers, who can use it in their classes to make the studying environment active and dynamic. Thus it can help teachers solve many problems in English class, help young children learning English, make them found English learning very interesting. They love English class. It's a good beginning to learn English in their future. 展开更多
关键词 total Physical Response strategy teaching children English Elementary English education. Language Action
下载PDF
一种结合Tile Coding的平均奖赏强化学习算法
6
作者 王巍巍 陈兴国 高阳 《模式识别与人工智能》 EI CSCD 北大核心 2008年第4期446-452,共7页
平均奖赏强化学习是强化学习中的一类重要的非折扣最优性框架,目前大多工作都主要是在离散域进行.本文尝试将平均奖赏强化学习算法和函数估计结合来解决连续状态空间的问题,并根据状态域的改变,相应修改 R-learning 和 G-learning 中参... 平均奖赏强化学习是强化学习中的一类重要的非折扣最优性框架,目前大多工作都主要是在离散域进行.本文尝试将平均奖赏强化学习算法和函数估计结合来解决连续状态空间的问题,并根据状态域的改变,相应修改 R-learning 和 G-learning 中参数的更新条件.此外对结合函数估计的 G-learning 算法的性能表现及其对各种参数的敏感程度进行针对性研究.最后给出实验结果及分析.实验结果证明 R-learning 和 G-learning 在ε较小的情况下解容易发散,同时也说明特征抽取方法 Tile coding 的有效性,且可作为其它特征抽取方法的参考标准. 展开更多
关键词 强化学习 马尔可夫决策过程(MDP) r-学习 G-学习 平均奖赏
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部