摘要
大数据环境下的信息挖掘已成为推荐系统研究较为活跃的领域,通过对现有大数据处理框架的对比,采用Spark大数据计算处理引擎,结合基于隐式反馈的ALS协同过滤推荐算法,提出一种Spark框架下ALS算法并行化解决方案,设计了分布式流式计算系统(Spark Distributed-ALS,SD-ALS)。实验结果验证了ALS算法在Spark集群环境下预测精度与单机环境基本保持一致,随迭代次数的增大,RMSE逐渐趋于稳定,并且计算效率显著提升,满足实时推荐的性能要求。
Information mining has become an active research field of recommender system under big data environment. A Spark framework ALS algorithm parallelization solution,which is called Distributed Flow Computing System( Spark Distributed-ALS,SD-ALS)is proposed through the comparison of the existing Big Data processing framework and the usage of Spark Big Data calculation processing engine,combining with implicit feedback ALS collaborative filtering recommendation algorithm. The experimental results verify that the prediction accuracy of the ALS algorithm in the Spark cluster environment is consistent with that in the Stand-alone environment. As the number of iterations increases,RMSE tends to be stable and the computational efficiency is significantly improved to meet the performance requirements of Real-time Recommendation.
作者
舒贵阳
辜丽川
冯娟娟
陈卫
赵子豪
王超
SHU Guiyang;GU Liehuan;FENG Juanjuan;CHEN Wei;ZHAO Zihao;WANG Chao(Anhui Agricultural University,Hefei 230036,Chin)
出处
《洛阳理工学院学报(自然科学版)》
2018年第2期71-77,共7页
Journal of Luoyang Institute of Science and Technology:Natural Science Edition
基金
国家自然科学基金项目(31371533)