基于最先策略增强学习的ART2神经网络被引量：3

Foremost-Policy Reinforcement Learning Based ART2 Neural Network

导出

摘要提出一种基于最先策略增强学习的 ART2神经网络 FPRL-ART2(Foremost-Policy Reinforcement Learn-ing based ART2 neuraI network),并介绍其学习算法.为了达到在线学习的目的.在 FPRL-ART2中,从状态到行为值之间的映射中,选择第一个得到奖励的行为,而不是选择诸如1-step Q-Learning 中具有最优行为值的行为.ART2神经网络用于存储分类模式,其权重通过增强学习增强或减弱,达到学习的目的.并将 FPRL-ART2运用到移动机器人避碰撞问题的研究中.仿真实验表明,引入 FPRL-ART2后减少移动机器人与障碍物发生碰撞的次数,具有良好的避碰效果. A foremost-policy reinforcement learning based ART2 neural network （FPRL-ART2） and its learning algorithm are proposed in this paper. To fit the requirement of real time learning, the first awarded behavior based on present states is selected in our Foremost-Policy Reinforcement Learning （FPRL） in stead of the optimal behavior in 1 step Q-Learning. The algorithm of FPRL is given and it is integrated with ART2 neural network. The stored weights of classified pattern in ART2 is increased or decreased by reinforcement learning. The FPRL-ART2 is successfully used in collision avoidance of mobile robot and the simulation experiment indicates that the times of collision between robot and obstacle is effectively decreased. The FPRL-ART2 makes favorable result of collision avoidance.

作者樊建吴耿锋

机构地区上海大学计算机工程与科学学院

出处《模式识别与人工智能》 EI CSCD 北大核心 2006年第3期428-432,共5页 Pattern Recognition and Artificial Intelligence

基金上海市科学技术发展基金项目(No.015115042) 上海市教委第4期重点学科建设项目(No.B682)

关键词增强学习 ART2神经网络最先策略避碰撞 Reinforcement I.earning , ART 2 Neural Network , Foremost - Policy , Collision Avoidance

分类号 TP183 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献11

1Carpenter G A, Grossberg S. ART2: Stable Self-Organization of Category Recognition Codes for Analog Input Patterns. Applied Opries, 1987, 26(23) : 4919--4930
2Liu X H, Yu Z Z, Dnan J, live Resonance Theory, In:et al. Face Recognition Using Adap-Proc of the International Conference on Machine Learning and Cybernetics. Xi'an, China, 2003, V:3167-3171
3Fan J, Wu G F, et al. Reinforcement Learning and ART2 Neural Network Based Collision Avoidance System of Mobile Robot In: Yin F L, Wang J, Guo C G, eds. Lecture Notes in Computer Science. 2004, 3174: 35-40
4黎明,严超华,刘高航.具有更严格警戒测试准则的ART2神经网络[J].中国图象图形学报（A辑）,2001,6(1):81-85. 被引量：4
5Whitehead S D, Sutton R S, Ballard D H. Advances in Rein foreement Learning and Their Implications for Intelligent Control. In: Proe of the 5th IEEE International Symposium on In telligent Control, Philadelphia, USA ,1990, Ⅱ: 1289--1297
6Suwimonteerabuth D, Chongstitvatana P. Online Robot Learning by Reward and Punishment for a Mobile Robot. In; Proc of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Lausanne, Swilzerland, 2002, Ⅰ : 921--926
7Fnjimori A, Tani S. A Navigation of Mobile Robot with Collision Avoidance for Moving Obstacles. In: Proc of the IEEE International Conference on Industrial Technology. Bangkok, Thailand, 2002, Ⅰ: 1- 6
8Grossberg S. Adaptive Pattern Classification and Universal Recoding, Ⅰ: Parallel Development and Coding of Neural Feature Delectors. Biological Cybernetics, 1976,23(3): 121- 134
9Grossberg S. Adaptive Pattern Classification and Universal Recoding, II: Feedback, Expectation, Olfaetion, lllusions. Biological Cybernetics, 1976, 23(4): 187-202
10Sution R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, USA: MIT Press, 1998

二级参考文献5

1Carpenter G A,Grossberg S. ART2:Stable self- organization of category recognition codes for analog input patterns. Applied Optics, 1987,26.4919- 4930.
2Baxter R. Error propagation and supervised learning in adaptive resonance networks. In,International Joint Conference on Neural Networks Ⅱ, Piscataway, NJ. USA, 1991,423-429.
3Li F, Zhan J. Fuzzy adapting vigilance parameter of ART-Ⅱ neural nets. In: IEEE International Conference on Neural Networks, Orlando, FL, USA, 1994,3:1680-1685.
4Huang J. Georgiopoulos M, Heileman G L. Fuzzy AET properties. Neural Networks. 1995,3:202-213.
5Ming L, Paul S W. Personal identification by palm prints recognition. In:The 10th International FLAIRS Conference.Daytona Beach, Florida, USA, 1997,1211-1218

共引文献3

1周欣然,滕召胜,刘晓波.ART-2及其改进方法综述[J].模式识别与人工智能,2007,20(5):667-674. 被引量：1
2叶晓明,林小竹.慢速权值更新的ART2神经网络研究[J].计算机工程与应用,2010,46(24):146-150. 被引量：1
3陈国灿,高茂庭.ART2神经网络的一种改进[J].计算机工程与应用,2014,50(18):137-141. 被引量：2

同被引文献7

1李武军,王崇骏,张炜,陈世福.人脸识别研究综述[J].模式识别与人工智能,2006,19(1):58-66. 被引量：106
2Zhao W, Chellappa R, Rosenfeld A, et al. Face Recognition:A Literature Survey[ J ]. ACM Computing Surveys ,2003,35 (4) :399 - 458.
3Koji Kotani, Chen Qiu,Tadahiro Ohmi. Face Recognition Using Vector Quantization Histogram Method [ J ]. IEEE ICIP,2002 (9) : 105 - 108.
4Phillips P J, Grother P, Micheals R J, et al. Face Recognition Vendor Test 2002 : Evaluation Report. 2003. http ://www. frvt. org/ FRVT2002/documents. htm.
5Chen Lifen. A new LDA-based face recognition system which can solve the small sample size problem [ J ]. Pattern Recognition, 2000, 33 (10) :1713 - 1726.
6AT&T Labaratory Cambridge. The ORL Database of Faces [ DB/OL ]. http ://www. uk. research. att. com/facedatabase. html.
7Phil Picton. Neural Networks [ M ]. 2nd ed. Hampshire : Palgrave,2000 : 115 - 137.

引证文献3

1顾明.基于模糊ART神经网络的在线人脸识别模型的设计和实现[J].计算机科学,2007,34(8):232-235. 被引量：2
2顾明,周景洲,李建中.模糊ART神经网络的识别算法及其应用[J].计算机工程与设计,2008,29(18):4786-4788. 被引量：2
3顾明.模糊ART神经网络模型及其在图像识别中的应用[J].计算机应用与软件,2010,27(2):261-263. 被引量：2

二级引证文献6

1胡勤.人工智能概述[J].电脑知识与技术,2010,6(5):3507-3509. 被引量：55
2刘再涛,魏本征,柳澄.一种基于视觉感知的复合医学图像分割算法[J].郑州大学学报（理学版）,2011,43(1):57-61. 被引量：7
3曾岳,冯大政,付达杰.最小风险贝叶斯决策的二值化人脸识别算法[J].计算机工程与设计,2011,32(10):3511-3513. 被引量：3
4宋晓宇,李玉冲,刘继飞.基于拓扑结构的工程图纸识别方法[J].沈阳建筑大学学报（自然科学版）,2011,27(4):776-781. 被引量：4
5王家炜,杨燕翔,孟海林,陈伟.基于模糊ART神经网络的变压器局部放电模式识别[J].西华大学学报（自然科学版）,2013,32(6):79-83. 被引量：3
6舒锦宏,徐灵江,吕延春,段文华,钟守平.基于模糊自适应共振神经网络的电缆局部放电模式识别[J].浙江电力,2021,40(11):10-15. 被引量：7

1戴华平,孙优贤.生产系统的建模方法及其动态特性研究[J].控制理论与应用,2000,17(4):524-528. 被引量：3
2梁燕华,金鸿章,蔡成涛.模糊控制策略在避碰运动系统中的应用[J].系统仿真学报,2008,20(6):1554-1558. 被引量：3
3唐亮贵,刘波,唐灿,程代杰.基于神经网络的Agent增强学习模型[J].计算机科学,2007,34(11):156-158. 被引量：3
4徐潼,唐振民.动态环境中的移动机器人避碰规划研究[J].机器人,2003,25(2):117-122. 被引量：6
5程小康.整体最优效率行为数学模型及其应用[J].四川大学学报（自然科学版）,2008,45(2):287-290. 被引量：4
6王文玺,肖世德,孟祥印,张卫华.模糊神经网络下基于强化学习的自主式地面车辆路径规划研究[J].中国机械工程,2009(21):2536-2541. 被引量：2
7高升,董洪斌.一种基于速度势场的局部在线避碰方法[J].哈尔滨师范大学自然科学学报,2003,19(1):42-45. 被引量：2
8黄旭,吴小红,马小龙.基于排队论的云服务分流博弈及均衡分析[J].兰州理工大学学报,2015,41(3):96-101. 被引量：2
9洪晔,边信黔.基于三维速度势场的AUV局部避碰研究[J].机器人,2007,29(1):88-91. 被引量：2
10郑延斌,牛丽平.基于随机对策的团队CGA学习[J].计算机工程与应用,2009,45(23):52-54.

模式识别与人工智能

2006年第3期

浏览历史

内容加载中请稍等...

基于最先策略增强学习的ART2神经网络被引量：3

参考文献11

二级参考文献5

共引文献3

同被引文献7

引证文献3

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

基于最先策略增强学习的ART2神经网络 被引量：3

参考文献11

二级参考文献5

共引文献3

同被引文献7

引证文献3

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

基于最先策略增强学习的ART2神经网络被引量：3