摘要
传统Slope One算法未考虑用户相似性和项目相似性对评分效果的影响,从而导致推荐准确率不高,并且在当前大数据背景下,传统Slope One算法运行效率低下。针对以上问题,提出一种基于Spark的改进加权Slope One算法,该算法融入了相似性计算、活跃用户筛选和用户聚类等技术,并在Spark平台上实现了并行化。通过在MovieLens数据集上进行试验验证,并比较算法在Spark和Hadoop平台并行化的运行效率,证实了该算法可以有效降低MAE,且在Spark平台下运行效率更高,更适用于大数据处理场景。
The traditional slope one algorithm does not consider low user similarity and item similarity on scoring effect,which leads to low recommendation accuracy,and in the current big data background,it suffers from low efficiency of operation.In order to solve the above problems,an improved weighted Slope One algorithm based on Spark is proposed in this paper.The algorithm integrates similarity computing,active user filtering and user clustering technology,and implements parallelization on Spark platform.Through the experiments on MovieLens data sets,this article confirms that the algorithm can effectively reduce MAE,and compares the running efficiency of the parallel algorithm in Spark and Hadoop platform to confirm this algorithm in Spark platform runs more efficiently,more suitable for big data processing.
作者
梁化强
唐坚刚
LIANG Hua-qiang;TANG Jian-gang(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)
出处
《软件导刊》
2018年第6期92-94,99,共4页
Software Guide