摘要
随着气象事业现代化水平的不断提高,气象部门积累了海量的气象数据,如何从海量的气象数据中挖掘出有用的知识,是提高气象服务能力的关键所在。针对传统聚类算法无法有效处理海量数据的问题,提出了一种基于Spark框架的Canopy-FCM(Canopy-fuzzy C-means)并行化聚类算法。该算法将Canopy算法与FCM算法相结合,避免了FCM算法对初始聚类中心敏感的问题,并结合Spark分布式框架内存计算的优势,大大降低了海量气象数据的处理时间。通过采用天津市208个区域自动气象站4~10月逐月降水观测数据,评估了天津市不同区域的降水情况。实验结果表明,提出的方法不仅可以快速有效地从气象数据中挖掘出有用的信息,同时与基于Hadoop框架下的算法相比,有更高的运行速率和加速比,也为相关部门有效地做出水旱灾害监测预警与风险防范决策提供了一种全新的思路和方法。
As the continuous improvement of the modernization level of meteorological service,a huge amount of meteorological data was accumulated in meteorological department.How to dig out useful knowledge from massive meteorological data is the key to improve the meteorological service ability.In view of the issues that traditional clustering algorithm cannot effectively processing massive data,the parallel Canopy-FCM(Canopy-fuzzy c-means)clustering algorithm based on the Spark framework is proposed.The algorithm combines Canopy algorithm with FCM algorithm,which can avoid the sensitivity of FCM algorithm to the initial clustering center.Combined with the advantages of memory calculation of Spark distributed framework,the processing time of massive meteorological data has been greatly reduced.Then,the precipitation situation of different regions has been evaluated through using the monthly precipitation observation data from April to October of 208 regional automatic weather stations in Tianjin.The experiment shows that the proposed method cannot only dig out useful information from meteorological data quickly and effectively,but also has higher running speed and acceleration ratio compared with the algorithm based on Hadoop framework.It also provides a new idea for related departments to make effectively decision on monitoring and early warning of flood and drought disasters and risk prevention.
作者
勾志竟
宫志宏
徐梅
刘布春
GOU Zhi-jing;GONG Zhi-hong;XU Mei;LIU Bu-chun(Tianjin Meteorological Information Center,Tianjin 300074,China;Tianjin Climate Center,Tianjin 300074,China;Institute of Environment and Sustainable Development in Agriculture,Chinese Academy of Agricultural Sciences,Beijing 100081,China)
出处
《计算机技术与发展》
2020年第8期169-173,共5页
Computer Technology and Development
基金
国家重点研发项目(2017YFC1502800)。