摘要
针对目前气象数据存储所面临的海量扩张、高并发读写、结构化和非结构化数据并存以及长时间序列和大数据集检索效率低下等问题,提出了以Hadoop开源框架为基础的气象数据分布式存储方案。通过对气象数据自身属性和特点进行分析,得出了气象数据在经过充分优化的基础上,在分布式存储框架中具有很强的适应性和规模化应用的潜力;并在HBase数据库中的Row Key设计和小文件合并策略方面做了创新。最后针对气象数据中广泛存在的结构化和非结构化这两种主要数据类型,以自动气象站数据和雷达产品数据为具体实例,给出了详细的设计思路和实现方法。
Regarding the current problems exised in meteorological data storage such as massive expansion, high concurrent read and write, coexistence of structured data and unstructured data, low efficiency of retrieval for long time series or large data set, a distributed storage solution for meteorological data based on the Hadoop open source framework is proposed.On the basis of in-depth analysis of attributes and characteristics of data, the solution proves that the meteorological data has strong adaptability to distributed storage framework and high potentiality for large-scale applications after being fully optimized.Besides the solution gives some specific innovations in HBase database Row Key design and small file merging strategies.Finally, the article makes detailed design and implementation for the two main widespread structured and unstructured meteorological data by the example of automatic weather stations data and radar product data respectively.
作者
周笑天
冯勇
陈益玲
陈澍
ZHOU Xiao-tian;FENG Yong;CHEN Yi-ling;CHEN Shu(Shandong Provincial Meteorological Information Center,Jinan 250031,China)
出处
《信息技术》
2022年第1期68-74,共7页
Information Technology
基金
山东省气象局面上课题(2018sdqxm06)。