摘要
针对历史数据稀疏性导致推荐算法预测精度低的问题,提出基于多重相似度分析和CatBoost的推荐算法。利用修正的余弦相似度函数求解项目元数据和评分数据的相似矩阵并进行融合;采用大规模信息嵌入网络(large-scale information network embedding,LINE)对融合后的相似矩阵进行多阶相似性分析计算更精确的近邻集;以此作为CatBoost的输入预测项目评分并利用Top-N推荐项目。为验证其有效性,在MovieLens数据集上进行实验并与其它方法对比。实验结果表明,该方法具有更高的推荐精度、更强的稳定性,可解决历史数据稀疏性导致的推荐质量低的问题。
To solve the problem that the sparsity of historical data leads to the prediction accuracy of the recommendation algorithm,a recommendation algorithm based on multiple similarity analysis and CatBoost was proposed.The modified cosine similarity function was used to solve the similarity matrix of item metadata and score data,and the similarity matrix was fused.The LINE was applied to perform multi-order similarity analysis on the fused similarity matrix to calculate a more accurate nearest neighbor set.The nearest neighbor set was used as input of CatBoost to predict item scores and the items were recommended by top-N.To verify the effectiveness of this algorithm,experiments were carried out on the MovieLens data set and compared with other methods.The results show that the proposed method has higher recommendation accuracy and stronger stability,and can effectively solve the problem of low recommendation quality caused by the sparsity of data sets in the recommendation system.
作者
杨怀珍
张静
李雷
YANG Huai-zhen;ZHANG Jing;LI Lei(School of Business,Guilin University of Electronic Technology,Guilin 541004,China;School of Business,Guilin University of Technology,Guilin 541004,China)
出处
《计算机工程与设计》
北大核心
2023年第9期2687-2693,共7页
Computer Engineering and Design
基金
国家自然科学基金面上基金项目(72074058)
国家自然科学基金项目(71562008,61866009,61906050)
广西重点研发基金项目(2017GXNSFDA198025)
广西研究生教育创新计划基金项目(YCBZ2022112)
广西创新驱动重大专项基金项目(AA17202024)
广西八桂学者专项经费基金项目(厅发[2019]79号)
广西高等学校千名中青年骨干教师培育计划基金项目(桂科[2018]18号)。
关键词
个性化推荐
集成学习
元数据
数据融合
相似度
修正的余弦相似度函数
大规模信息嵌入网络
personalized recommendation
ensemble learning
metadata
data fusion
similarity
modified cosine similarity function
large-scale information network embedding