期刊文献+

基于Apache Spark的地震观测数据噪声功率谱计算 被引量:2

Noise Power Spectrum Calculation Method of Seismic Data Based on Apache Spark
下载PDF
导出
摘要 为解决单机环境下海量地震观测数据计算和分析效率低下的问题,提出一种基于分布式架构的地震观测数据的存储、计算和分析处理方法,选择噪声功率谱复杂计算过程的应用场景进行实现.基于Hadoop在海量数据处理上的性能优势,在分布式文件存储系统HDFS上进行地震观测数据的存储和调度,研究测震数据噪声功率谱的质量评估方法在Spark分布式计算架构上的实现,采用弹性数据集Spark RDD将计算任务自动分配到计算节点,解析存储在HDFS中的测震波形数据,计算结果采用RowKey方式放入分布式数据库HBase中,实现了长周期地震噪声功率谱结果的存储和提取.计算结果表明,基于Spark分布式架构的该方法可以支撑TB级海量数据的处理,并且具有较高的处理效率,可应用于海量地震观测数据的分析计算. To solve the problem of inefficient calculation and analysis of massive seismic data in a single machine environment,we propose a distributed architecture based method for storage,calculation,and analysis of seismic data and select the complex calculation process of a noise power spectrum as the application scenario for implementation.In light of Hadoop’s performance advantage in massive data processing,the storage and scheduling of seismic data are carried out on the Hadoop Distributed File System(HDFS).The implementation of the quality evaluation method for the noise power spectrum of seismic data in Spark distributed computing architecture is studied.The elastic dataset Spark RDD is used to automatically allocate the tasks to the computing nodes,and the seismic waveform data stored in HDFS is analyzed.In addition,the calculation results are input into the distributed database HBase in the RowKey mode,realizing the storage and extraction of the power spectra of long-period seismic noise.The calculation results show that the method based on Spark distributed architecture can support the efficient processing of massive data at the TB level in volume,which can be applied to the analysis and calculation of massive seismic data.
作者 郭凯 黎建辉 温亮明 韩振华 GUO Kai;LI Jian-Hui;WEN Liang-Ming;HAN Zhen-Hua(Computer Network Information Center,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100049,China;China Seismic Network Center,Beijing 100045,China;Taiyuan University of Technology,Taiyuan 030024,China)
出处 《计算机系统应用》 2021年第8期126-132,共7页 Computer Systems & Applications
基金 国家重点研发计划(2018YFC1504500) 中国地震局监测、预报、科研三结合课题(3JH-20200207)。
关键词 地震观测数据 噪声功率谱 SPARK HADOOP 分布式 seismic data noise power spectrum Spark Hadoop distributed
  • 相关文献

参考文献3

二级参考文献20

  • 1夏俊鸾,邵赛赛.Spark Streaming: 大规模流式数据处理的新贵. http://www.csdn.net/article/2014-01-28/2818282-Spark -Streaming-big-data. 2014.
  • 2Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. Communications of the ACM, 2008, 3(51-1): 107-113.
  • 3耿益锋,陈冠诚.Impala:新一代开源大数据分析引擎. http://www.csdn.net/article/2013-12-04/2817707-ImpalaBig- Data-Engine. 2013.12.
  • 4Strom. http://storm.incubator.apache.org/. 2014.
  • 5Zaharia M, Chowdhury M, Das T, et al. Resilient distributed datasets: A fault-tolerant abstration for in-memory cluster computing. Proc. of the 9th USENIX Conference on NetWorked System Design and Implementation. 2012. 2-16.
  • 6Gonzalez J, Low Y, Gu H. PowerGraph: Distributed garph-p arallel computation on natural graphs. Proc. of the 10th USENIX Symposium on Operating Systems Design and Implementatin. 2012. 17-30.
  • 7Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I. Spark: Cluster Computing with Working Sets. Technical Report No. UCB/ EECS- 2010-53May 7, 2010.
  • 8Xin R, Rosen J, et al. Shark: SQL and Rich Analytics at Scale. Technical Report UCB/EECS. 2012.11.
  • 9Engle C, Lupher A, et al. Shark: Fast Data Analysis Using Coarse-grained Distributed Memory. SIGMOD 2012. May 2012.
  • 10Zaharia M, Das T, Li HY, Shenker S, Stoica I. Discretized streams: An efficient and fault-tolerant model for stream. Proc. on Large Clusters. HotCloud 2012. June 2012.

共引文献87

同被引文献11

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部