摘要
在以数据驱动为主导的大数据时代,信息资源量呈几何级增长,“信息超载”问题对数据分析与处理提出了更高的要求。从海量数据中提取有效信息并进行系统的分析与挖掘,从而满足用户的个性化需求将大大增强企业竞争力。本文结合Hadoop与Spark的优点,设计并搭建了包括HDFS、MongoDB、MLlib、Tableau等集群的大数据分析与挖掘平台,并实践了基于Amazon电商交易数据集的个性化商品推荐应用。利用SparkMLlib的ALS矩阵分解协同过滤推荐算法对用户购买行为进行模型训练和推荐,最后实验结果表明,此大数据分析与挖掘平台对于对个性化商品的推荐可实现不错的效果。
In the era of big data, which is dominated by data drive, the amount of information resources increases exponentially, and the problem of “information overload” puts forward higher requirements for data analysis and processing. Extracting effective information from massive data and carrying on systematic analysis and mining so as to meet the individual needs of users will greatly enhance the competitiveness of enterprises. Combining the advantages of Hadoop and Spark, this paper designs and builds an analysis and mining platform for big data, including HDFS,MongoDB,MLlib,Tableau and other clusters, and practices the personalized commodity recommendation based on Amazon e-commerce transaction data set. Application. The ALS matrix decomposition collaborative filtering recommendation algorithm of Spark MLlib is used to train and recommend the purchase behavior of users. Finally, the experimental results show that the big data analysis and mining platform can achieve a good effect on the recommendation of personalized goods.
作者
李晓颖
赵安娜
周晓静
杨成伟
Li Xiaoying;Zhao Anna;Zhou Xiaojing;Yang Chengwei(school of Management Science and Engineering, Shandong University of Finance and Economics,Ji’nan Shandong,250014)
出处
《电子测试》
2019年第12期65-66,81,共3页
Electronic Test
基金
中国博士后科学基金第58批面上资助项目“面向媒体大数据分析任务的关联规则挖掘与并行处理系统(5M582104)”
山东省自然基金“基于云计算环境的大规模关联数据挖掘与并行优化方法研究(BS2015DX013)”
山东省自然基金(面上项目)“基于隐式反馈数据的情感分析与推荐方法研究(ZR2019MG037)”
山东省高等学校科技计划项目立项“分布式异构环境下动态资源管理策略与延迟调度方法研究(J14LN19)”
山东财经大学校级特色课程(A2017008)