融合动作退出和软奖励的强化学习知识推理方法

Knowledge Reasoning Method of Reinforcement Learning Integrating Action Withdrawal and Soft Reward

下载PDF

导出

摘要针对深度强化学习推理方法中存在的过拟合以及稀疏奖励的问题,提出了一种融合动作退出和软奖励的强化学习知识推理方法(knowledge reasoning method of reinforcement learning integrating action withdrawal and soft reward,AS-KRL)。AS-KRL使用门控循环神经网络(GRU)对历史路径信息进行编码,为智能体的动作选择提供当前节点的全局信息;引入动作退出策略随机隐藏部分神经元后再构建策略网络,提高模型路径搜索的成功率,还避免了可能出现的过拟合问题;通过策略网络指导智能体进行动作选择,调用评分函数计算智能体所选三元组的相似度得分,并将所得分数作为智能体的奖励,有效解决稀疏奖励问题。为验证该方法的有效性,在FB15K-237和NELL-995数据集上进行实验,将实验结果与TransE、MINERVA、HRL等9种主流方法进行对比分析,结果表明该方法在链接预测任务上的Hits@k平均提升了0.027,MRR平均提升了0.056。 Aiming at the problems of overfitting and sparse reward in deep reinforcement learning reasoning methods,a knowledge reasoning method of reinforcement learning integrating action withdrawal and soft reward is proposed(AS-KRL).AS-KRL uses gated recurrent unit(GRU)to encode the historical path information and provide the global information of the current node for the agent’s action selection.By introducing the action exit strategy to hide some neurons randomly,the strategy network is constructed to improve the success rate of model path search and avoid the possible overfitting problem.The strategy network is used to guide the agent to make action selection,and the score function is called to calcu-late the similarity score of the triplet selected by the agent,and the score is taken as the reward of the agent,which effec-tively solves the sparse reward problem.To verify the effectiveness of the proposed method,experiments are carried out on FB15K-237 and NELL-995 datasets.The experimental results are compared with those of 9 mainstream methods such as TransE,MINERVA and HRL.The results show that the proposed method improves Hits@k by an average of 0.027 and MRR by an average of 0.056 on the link prediction task.

作者孙崇王海荣荆博祥马赫 SUN Chong;WANG Hairong;JING Boxiang;MA He(School of Computer Science and Engineering,North Minzu University,Yinchuan 750021,China;The Key Laboratory of Images&Graphics Intelligent Processing of State Ethnic Affairs Commission,North Minzu University,Yinchuan 750021,China)

机构地区北方民族大学计算机科学与工程学院北方民族大学图像图形智能处理国家民委重点实验室

出处《计算机工程与应用》 CSCD 北大核心 2024年第24期158-165,共8页 Computer Engineering and Applications

基金宁夏自然科学基金(2023AAC03316)。

关键词知识推理强化学习动作退出算法门控循环神经网络软奖励机制 knowledge reasoning reinforcement learning action dropout gated recurrent unit(GRU) soft reward mechanism

分类号 TP391 [自动化与计算机技术—计算机应用技术]