摘要
序列推荐可形式化为马尔科夫决策过程,进而转化为深度强化学习问题,其关键是从用户序列中挖掘关键信息,如偏好漂移、序列之间的依赖关系等,但当前大多数基于深度强化学习的推荐系统都是以固定序列长度作为模型输入.受知识图谱的启发,文中设计基于知识引导的自适应序列强化学习模型.首先,利用知识图谱的实体关系,从完整的用户反馈序列中截取部分序列作为漂移序列,其中漂移序列中的项目集合表示用户的当前偏好,序列长度表示用户的偏好变化速度.然后,通过门控循环单元提取漂移序列中用户的偏好变化和项目之间的依赖关系,同时利用自注意力机制对关键的项目信息进行选择性关注.最后,设计复合奖励函数,包括折扣序列奖励和知识图谱奖励,用于缓解奖励稀疏的问题.在4个真实世界数据集上的实验表明,文中模型的推荐准确率较优.
The sequence recommendation can be formalized as a Markov decision process and then transformed into a deep reinforcement learning problem.Mining critical information from user sequences is a key step,such as preference drift and dependencies between sequences.In most current deep reinforcement learning recommendation systems,a fixed sequence length is taken as the input.Inspired by knowledge graphs,a knowledge-guided adaptive sequence reinforcement learning model is proposed.Firstly,using the entity relationship of the knowledge graph,a partial sequence is intercepted from the complete user feedback sequence as a drift sequence.The item set in the drift sequence represents the user′s current preference,and the sequence length represents the user′s preference change speed.Then,a gated recurrent unit is utilized to extract the user′s preference changes and dependencies between items,while the self-attention mechanism selectively focuses on key item information.Finally,a compound reward function is designed,including discount sequence rewards and knowledge graph rewards,to alleviate the problem of sparse reward.Experiments on four real-world datasets demonstrate that the proposed model achieves superior recommendation accuracy.
作者
李迎港
童向荣
LI Yinggang;TONG Xiangrong(School of Computer and Control Engineering,Yantai University,Yantai 264005)
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2023年第2期108-119,共12页
Pattern Recognition and Artificial Intelligence
基金
国家自然科学基金项目(No.62072392,61972360)
山东省重大科技创新工程项目(No.2019522Y020131)资助。
关键词
自适应序列
深度强化学习
知识图谱
自注意力机制
循环神经网络
Adaptive Sequence
Deep Reinforcement Learning
Knowledge Graph
Self-Attention Mechanism
Recurrent Neural Network