期刊文献+

基于Spark的遥感数据分析方法 被引量:1

Spark-based Remote Sensing Data Analysis
下载PDF
导出
摘要 随着遥感技术的快速发展,遥感数据呈爆炸式增长,给遥感数据计算带来巨大的挑战。采用基于内存计算的Spark分布式计算框架以克服该问题,并选择YARN作为资源调度系统和采用HDFS为分布式存储系统。Spark是一个开源的分布式计算框架,基于弹性分布式数据集(RDD)概念,采用先进的有向无环图执行机制以支持循环数据流操作,通过一次数据导入内存就可以完成多次迭代运算。因而,特别适合基于多次迭代的大数据计算分析方法,相较于每轮迭代需把数据导入内存的Map Reduce有更大的优势。将该计算框架应用于海量遥感数据分析,验证需要多次迭代的奇异值分解(SVD)算法在该数据分析中的有效性。实验表明,随着迭代次数增加,基于Spark的SVD运算效率相对于Map Reduce有明显提高,通常可提高一个数量级。 With the fast development of remote sensing techniques,the volume of acquired data grows exponentially.This brings a big challenge to process massive remote sensing data.In the paper,an in-memory computing framework is proposed to address this problem.Here,Spark is an open-source distributed computing platform with Hadoop YARN as resource scheduler and HDFS as cloud storage system.Spark is based on an abstraction so-called resilient distributed datasets(RDD).and it has an advanced directed acyclic graph(DAG) execution engine to support a cyclic data flow.On the Spark-based platform,the data loaded into memory in the first iteration can be reused in the subsequent iterations.This mechanism makes Spark much suitable for running multi-iteration algorithms compared to MapReduce which has to load data in each iteration.The experiments are carried out on massive remote sensing data using multi-iteration singular value decomposition(SVD) algorithm.The results show that Spark-based SVD can obtain significantly faster computation time than that by MapReduce.usually by one order of magnitude.
出处 《微型电脑应用》 2015年第8期65-67,6,共3页 Microcomputer Applications
基金 国家自然科学基金 (71331005)
关键词 大数据计算 遥感数据 HADOOP SPARK MAPREDUCE Big Data Computing Remote Sensing Data Hadoop Spark MapReduce
  • 相关文献

参考文献12

  • 1姚禹,向晶.全球在轨卫星数量突破1000颗大关[J].中国无线电,2012(11):77-77. 被引量:2
  • 2CUDA, http://www.nvidia.cona/obj ect/cuda home new.html/.
  • 3Xu .I Y, OpenCL-The Open Standard tbr Parallel Programming of Heterogeneous Systems[J]. 2008.
  • 4Chetlur S, Woolley C, Vandermersch P, et al. cudnn: Efficient primitives Ibr deep learning[J], arXiv preprint arXiv: 1410.0759, 2014.
  • 5Borthakur, D."The hadoopdistributed file system: Architecture anddesign," [J]Hadoop ProjectWebsite,2007, 21(11).
  • 6Dean J and Ghemawat. S,"Mapreduce: simplified data processingon large clusters," [C].Conununications of tile ACM,51(1):107-113, 2008.
  • 7Golpayegalfi.N andHalem.M "Cloud computing tbr satellite dataprocessing on high end compute clusters," [J] in Cloud Computing, 2009.CLOUD'09. IEEE International Conference on. IEEE, 2009:88-92.
  • 8Pan.X and Zhang.S, "A remote sensing image cloud processingsystem based on hadoop,'" [J] in Cloud Computing and Intelligent Systems(CCIS), 2012 IEEE 2nd International Conference on, vol. 1. IEEE,2012, pp. 492-494.
  • 9Grossman M, Bretemitz Mr Sarkar V. HadoopCL: MapReduce on Distributed Heterogeneous Platforms through Seamless Integration of Hadoop and OpenCL[.l]. Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th hltemational, 2013:1918-1927.
  • 10Wang Z, Lv P, Zheng C. CUDA on Hadoop: A Mixed Computing Framework for Massive Data Processing[M]//Foundations and Practical Applications of Cognitive Systems and Information Processing. Springer Berlin Heidelberg,2014:253-260.

共引文献1

同被引文献7

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部