期刊文献+

Spark平台下的RDD研究与应用 被引量:2

Research and analysis of RDD based on spark platform
下载PDF
导出
摘要 大数据时代下,计算海量数据的需求日益加剧,Spark是专门用于计算大规模数据量的并行计算框架,但在实际应用中使用较少。首先对Spark平台下RDD(Resilient Distributed Dataset)的基本概念进行介绍;其次对Spark与RDD的关系进行描述:Spark的核心是建立在抽象的弹性分布数据集RDD之上的,Spark可以将数据处理成为弹性分布数据集RDD,再通过RDD的转换接口和动作操作得到最终数据;最后在Spark平台实现电商用户页面单跳转化率统计实验和电商热门品类中Top10活跃Session统计实验,实现在实际生活中的应用,达到更快处理大规模数据的目的。 In the age of big data,the demand for computing massive data is increasing.Spark is a parallel computing framework specially used for computing large-scale data,but it is rarely used in practi-cal applications.First,the basic concepts of RDD(Resilient Distributed Dataset)on the Spark plat-form are introduced;Secondly,the relationship between Spark and RDD is described:the core of Spark is based on the abstract elastic distributed data set RDD.Spark can process the data into an elastic distributed data set RDD,and then obtain the final data through the RDD conversion interface and action operation;Finally,the single hop conversion rate statistics experiment of e-commerce user pages and the Top10 active session statistics experiment of e-commerce hot categories are implemented on the Spark platform to realize the application in real life and achieve the goal of processing large-scale data faster.
作者 马兆辉 赵睿哲 温秀梅 MA Zhaohui;ZHAO Ruizhe;WEN Xiumei(Hebei University of Architecture,Zhangjiakou,Hebei 075000;Big Data Technology Innovation Center of Zhangjiakou,Zhangjiakou,Hebei 075000)
出处 《河北建筑工程学院学报》 CAS 2023年第2期214-221,共8页 Journal of Hebei Institute of Architecture and Civil Engineering
关键词 大数据 SPARK RDD 电商 Big Data Spark RDD E-commerce
  • 相关文献

参考文献2

二级参考文献5

共引文献35

同被引文献6

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部