摘要
传统使用Hadoop平台基于协同过滤算法搭建的分布式推荐系统,存在两个亟待解决的问题:(1)在面对海量数据与复杂的推荐算法模型时,处理数据的速度明显下降,不能做到低延时,无法对用户进行实时推荐;(2)传统基于协同过滤的推荐算法,无法实时感知用户兴趣漂移的问题,导致推荐的结果差强人意。针对以上两个问题,引入新一代流式计算引擎Flink,使用Spark、Flume、Kafka等大数据组件搭建电影推荐系统,整个推荐系统的推荐算法部分,分为离线与在线推荐两大模块,离线推荐算法引入堆排序,解决MLlib中ALS算法在模型预测时会进行笛卡尔积计算,消耗大量内存与算法执行时间长的问题;实时推荐算法引入艾宾浩斯遗忘曲线,通过融合时间权重与奖惩因子,来动态地感知用户兴趣发生漂移的问题。通过离线与在线推荐算法的改进,产生更好的个性化Top-N推荐结果,提升最终用户的体验。实验结果表明:(1)通过堆排序改进后的离线推荐ALS算法,在RMSE指标基本不变的情况下执行速率显著提高;(2)通过引入艾宾浩斯遗忘曲线,融合时间权重与奖惩因子的实时推荐算法,在准确率和召回率指标上明显提高,推荐结果更符合用户兴趣爱好;(3)Flink计算引擎相比较Spark计算引擎在数据量不断增加的情况下,算法执行速度更快。
The traditional distributed recommendation system based on the Hadoop platform and the collaborative filtering algorithm has two problems to be solved urgently.First,due to huge amount of data and the complexed models of recommendation algorithm,the speed of data processing for the recommendation system is significantly reduced,and it is impossible to achieve low latency.Too much time is taken,the recommendation system is difficult to achieve real-time recommendation for users.Second,the traditional recommendation algorithm based on the collaborative filtering can not perceive the drift of user's interests in real time,resulting in the unsatisfactory results of recommendation.To solve the above two problems,Flink,a new generation of streaming computing engine,is introduced,and a movie recommendation system is built by adopting big data components such as Spark,Flume,and Kafka.The whole recommendation system consists of two parts i.e.,offline and online recommendation algorithm.For the offline recommendation algorithm,heap sorting is introduced to solve the problem that the ALS algorithm in MLlib will perform Cartesian product calculation during model prediction,consume a lot of memory and take a long time to execute.For the online recommendation algorithm,the Ebbinghaus forgetting curve,which integrates the time weights and reward-punishment factors,is introduced to dynamically perceive the user's interests drift.Through the improvement of offline and online recommendation algorithms,the recommendation system can achieve better personalized Top-N recommendation results,and improve the experience of users.The experimental results demonstrate that the improved offline recommendation algorithm of ALS by heap sorting can significantly improve the execution speed,while the RMSE index is almost unchanged.The improved online recommendation algorithm by introducing the Ebbinghaus forgetting curve,the real-time weight,and the reward-punishment factors can significantly improve the accuracy rate and recall rate indicators.The last recommendation results are more in line with the user's interests.Compared with the Spark computing engine,the Flink computing engine executes faster,when large amount of data need to be processed.
作者
李光明
杨攀攀
古婵
LI Guangming;YANG Panpan;GU Chan(College of Electronic Information and Artificial Intelligence,Shaanxi University of Science and Technology,Xi'an Shaanxi 710021,China;School of Electrical and Control Engineering,Shaanxi University of Science and Technology,Xi'an Shaanxi 710021,China)
出处
《电子器件》
CAS
2024年第5期1425-1433,共9页
Chinese Journal of Electron Devices
基金
国家自然科学基金项目(62003201)。