摘要
传统基于Hadoop或单机下基于Mahout构建的电影推荐系统面对数据量不断增大以及推荐算法模型需要大量迭代的情况时,会出现推荐效果差、运行速度明显下降、无法实时为用户进行个性化推荐的情况。针对以上问题,以电影评分数据集为背景,使用Hadoop、Spark、Kafka、Hive等大数据处理技术进行系统架构搭建,并采用改进后的余弦相似性的协同过滤和基于用户喜爱物品的物品协同过滤算法对MLlib协同过滤算法模型进行改进,对离线数据以及实时数据进行计算,产生TOP-N推荐结果,实现Spark平台下电影推荐系统。实验结果表明,在Spark平台下,该系统相较传统方法不仅数据处理速度和推荐准确性显著提升,而且稳定性更强。
When the traditional movie recommendation system based on Hadoop or single machine based on Mahout is faced with the increasing amount of data and the need for a large number of iterations of the recommendation algorithm model,it will lead to the situation that the recommendation effect is poor,the running speed is obviously reduced,and it is unable to make personalized recommendation for users in real time.Aiming at the above problems,based on the movie scoring data set,the system architecture is built by using Hadoop,Spark,Kafka,Hive and other big data processing technologies.The improved cosine similarity collaborative filtering algorithm and the item collaborative filtering algorithm based on user s favorite items were used to improve the MLlib collaborative filtering algorithm model.The offline data and real-time data were computed to generate the TOP-N recommendation results,and the movie recommendation system based on Spark platform was realized.The experimental results show that,compared with the traditional methods,this system improves the data processing speed and recommendation accuracy,and has stronger stability on the Spark platform.
作者
李光明
房靖力
Li Guangming;Fang Jingli(School of Electronic Information and Artificial Intelligence,Shaanxi University of Science and Technology,Xi’an 710021,Shaanxi,China)
出处
《计算机应用与软件》
北大核心
2020年第11期28-34,共7页
Computer Applications and Software