发现和学习不可复位动态系统的预测状态表示的一种新算法被引量：2

A New Algorithm for Discovery and Learning of Predictive State Representations in Dynamical Systems Without Reset

下载PDF

导出

摘要提出了一种发现和学习不可复位动态系统的预测状态表示的新算法.在证明系统的任意landmark均可作为系统的初始状态的基础上,利用发现的landmark确定系统在任意时间步所处的经历,然后采用蒙特卡罗方法估计任意经历下任意检验发生的概率,解决了在不可复位动态系统中,经历下检验发生的概率难以获取问题,进而发现和学习不可复位动态系统的预测状态表示.实验结果表明,本文算法获得的系统的预测状态表示在预测精度上明显优于suffix-history算法,验证了所提算法的有效性. A new algorithm for discovery and learning of predictive state representations in dynamical systems without reset is proposed. With proving that any landmark can be used as the initial state, the discovered landmarks are used to identify the history at any time step in a continues data,then the conditional probability of any test at any history is estimated using Monte Carlo approaches, which efficiently solves the difficult problem of obtaining the conditional probability in dynamical systems without reset, thereby it is straightforward to discover and learn predictive state representations. The empirical results show that in case of the obtained predictive state representations＇ s prediction quality, our algorithm has better prediction accuracy than the suffix-history algorithm, which proves the effectiveness of the proposed algorithm.

作者刘云龙李人厚

机构地区西安交通大学系统工程研究所

出处《电子学报》 EI CAS CSCD 北大核心 2009年第1期126-131,共6页 Acta Electronica Sinica

基金国家"211工程"资助西安交通大学"行动计划"资助

关键词预测状态表示不可复位动态系统 LANDMARK suffix—history算法 predictive state representations dynamical systems without reset landmark suffix-history algorithm

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献11

1L Kaelbling, M Littman, A Cassandra. Planning and acting in partially observable stochastic domains [ J ]. Artificial Intelligence, 1998,101 (1-2) :99 - 134.
2S Singh, M James, M Rudary. Predictive state representations: a new theory for modeling dynamical systems[ A]. In Uncertainty in Artificial Intelligence: Prcessdings of the Twentieth Conference[ C]. Banff, Alberta, Canada: AUAI Press, 2004. 512 - 519.
3M Littleman, R Sutton, S Singh. Predictive representation of state[ A ]. In Advances in Neural Information Processing Systems 14[C]. Vancouver, British Columbia, Canada: MIT Press, 2002.1555 - 1561.
4M Rosencrantz, G Gordon, S Thrun. Learning low dimensional predictive representations [ A ]. In Proceedings of the Twenty- First International Conference on Machine Learning[ C]. Banff, Alberta, Canada: ACM, 2004. 695 - 702.
5B Wolfe,M Jarnes,S Singh. Learning predictive state represen- tations in dynamical systems without reset[ A ]. In Proceedings of the Twenty-Second International Conference on Machine Learning[ C]. Bonn, Germany: ACM, 2005.980 - 987.
6D Wingate, S Singh. On discovery and learning of models with predictive state representations of state for agents with continuous actions and observations[A]. In Proceeding of the 2007 International Conference on Autonomous Agents and Multiagent Systems[C]. Honolulu, USA: ACM, 2007. 187 - 194.
7M James, S Singh. Learning and discovery of predictive state representations in dynamical systems with reset [ A ]. In Proceedings of the Twenty-First International Conference on Machine Learning[ C ]. Banff, Alberta, Canada: ACM, 2004.417 - 424.
8M James,B Wolfe, S Singh. Combining memory and landmarks with predictive state representations[ A]. In Proceedings of the International Joint Conference on Artificial Intelligence[ C]. Edinburgh, Scotland: Professional Book Center, 2005.734 - 739.
9A Cassandra. Tony' s POMDP file repository page [OL ]. http://www, cs. brown, edu/research/ai/pomdp/examples / index . html, 2008 - 06 - 02.
10M Bowling, P McCracken, M James, et al. Learning predictive state representations using non-blind policies[ A]. In Proceedings of the Twenty-Third International Conference on Machine Learning [C ]. Pittsburgh, Pennsylvania, USA: ACM, 2006. 129-136.

同被引文献32

1卿斯汉,文伟平,蒋建春,马恒太,刘雪飞.一种基于网状关联分析的网络蠕虫预警新方法[J].通信学报,2004,25(7):62-70. 被引量：40
2谢方军,唐常杰,元昌安,王锦,左吉力,陈安龙.基于流数据分类和分形维分析的DoS攻击检测[J].四川大学学报（工程科学版）,2004,36(6):87-92. 被引量：12
3孙钦东,张德运,高鹏.基于时间序列分析的分布式拒绝服务攻击检测[J].计算机学报,2005,28(5):767-773. 被引量：55
4陈秀真,郑庆华,管晓宏,林晨光.层次化网络安全威胁态势量化评估方法[J].软件学报,2006,17(4):885-897. 被引量：342
5宁烨,樊治平,冯博.知识联盟中知识共享的博弈分析[J].东北大学学报（自然科学版）,2006,27(9):1046-1049. 被引量：35
6李建华,徐婧.信息安全内涵属性的系统性分析[J].信息网络安全,2007(2):70-73. 被引量：4
7沈昌祥,张焕国,冯登国,曹珍富,黄继武.信息安全综述[J].中国科学（E辑）,2007,37(2):129-150. 被引量：359
8王永杰,鲜明,刘进,王国玉.基于攻击图模型的网络安全评估研究[J].通信学报,2007,28(3):29-34. 被引量：56
9唐勇,卢锡城,胡华平,朱培栋.Honeypot技术及其应用研究综述[J].小型微型计算机系统,2007,28(8):1345-1351. 被引量：9
10Manly B F J. Multivariate Statistical Methods: A Primer. New York: Chapman and Hall, 1986.

引证文献2

1刘云龙,吉国力.预测状态表示模型的复位算法[J].计算机学报,2012,35(5):1046-1051.
2李建华.网络空间威胁情报感知、共享与分析技术综述[J].网络与信息安全学报,2016,2(2):16-29. 被引量：51

二级引证文献51

1秦娅,申国伟,赵文波,陈艳平.基于深度神经网络的网络安全实体识别方法[J].南京大学学报（自然科学版）,2019,55(1):29-40. 被引量：20
2徐丽萍,郝文江.美国政企网络威胁情报现状及对我国的启示[J].信息网络安全,2016(9):278-284. 被引量：11
3陈亚亮,戴沁芸,吴海燕,魏征.Mirai僵尸网络恶意程序分析和监测数据研究[J].网络与信息安全学报,2017,3(8):35-43. 被引量：9
4李超,周瑛.大数据环境下的威胁情报分析[J].情报杂志,2017,36(9):24-30. 被引量：14
5李奎乐.日本政府网络安全领域跨部门情报共享机制剖析[J].情报杂志,2017,36(10):60-65. 被引量：4
6李奎乐.日本网络安全领域情报共享机制解析——基于结构功能主义的分析视角[J].情报杂志,2017,36(12):10-15. 被引量：9
7张繁,谢凡,江颉.网络威胁安全数据可视化综述[J].网络与信息安全学报,2018,4(2):34-39. 被引量：4
8杨沛安,武杨,苏莉娅,刘宝旭.网络空间威胁情报共享技术综述[J].计算机科学,2018,45(6):9-18. 被引量：22
9徐留杰,翟江涛,杨康,丁晨鹏.一种多源网络安全威胁情报采集与封装技术[J].网络安全技术与应用,2018(10):23-26. 被引量：8
10王晓周,乔喆,李雨昂,李斐.基础电信企业网络安全威胁情报工作思路探讨[J].电信工程技术与标准化,2018,31(12):7-12. 被引量：6

1雷珠,刘峰,赵志宏.预测状态表示综述[J].计算机应用研究,2010,27(2):401-404.
2汪庆淼,鞠时光.基于预测状态表示的多变量概率系统预测[J].计算机应用,2012,32(11):3044-3046. 被引量：2
3刘云龙,李人厚,刘建书.基于预测状态表示的Q学习算法[J].西安交通大学学报,2008,42(12):1472-1475. 被引量：3
4汪庆淼,鞠时光.基于预测状态表示模型和稀疏分布记忆的多观测系统预测[J].计算机应用研究,2012,29(8):2988-2990.
5王历,高阳,王巍巍.预测状态表示综述[J].山东大学学报（工学版）,2010,40(4):23-28. 被引量：1
6刘云龙,吉国力.预测状态表示模型的复位算法[J].计算机学报,2012,35(5):1046-1051.

电子学报

2009年第1期

浏览历史

内容加载中请稍等...

发现和学习不可复位动态系统的预测状态表示的一种新算法被引量：2

参考文献11

同被引文献32

引证文献2

二级引证文献51

相关作者

相关机构

相关主题

浏览历史

发现和学习不可复位动态系统的预测状态表示的一种新算法 被引量：2

参考文献11

同被引文献32

引证文献2

二级引证文献51

相关作者

相关机构

相关主题

浏览历史

发现和学习不可复位动态系统的预测状态表示的一种新算法被引量：2