摘要
将马尔可夫判决过程和智能强化学习算法相结合,给出了异构无线网络环境下用户业务偏好评估模型的技术框架。为动态环境下用户需求的感知、量化和适配特征的研究提供了基本的数学描述,对解决用户体验的评价问题和业务与业务环境的适配问题提供了新的研究思路。仿真结果表明所构建的MDP模型能够在多状态条件下学习用户偏好,根据用户需求智能选择业务。
A technical architecture for user preference model is presented,and the nature of the problem represented within a Markov Decision Process(MDP) combined with adaptive reinforcement learning algorithm is displayed.We provided a possible candidate solution for user modeling dynamically to satisfy the user's expected preference based on minimal or missing information.It is also a exploration for the evaluation of the user experience when selecting service providers.Simulations of the user models show that the ...
出处
《国防科技大学学报》
EI
CAS
CSCD
北大核心
2006年第6期81-85,共5页
Journal of National University of Defense Technology
基金
国家863高技术资助项目(2003AA12331004)
关键词
效用理论
用户偏好
马尔可夫判决过程
强化学习
utility theory
user preference
Markov decision process
reinforcement learning