摘要
针对自然界与人类社会二元水循环产生的海量水务观测数据,现有水务数据管理系统存在存储负荷大,数据库扩展困难,查询速度慢的问题,无法满足存储与分析的需要。针对问题,首先,结合虚拟化技术、Hadoop基础架构,设计分布式大数据存储平台的基本架构;其次,依据现有水务大数据情况及实际业务数据库表,实现分布式大数据存储平台的设计;最后,完成从集中式平台到分布式平台的数据迁移代码实现,并进行数据迁移实验测试。实验结果验证了分布式大数据存储平台设计方案的可行性与有效性,可为大规模行业数据的存储与处理提供一种理想的分布式解决方案。
In view of the massive water observation data generated by the dual water cycle of nature and human society, the existing water data management system has the problems of large storage load, difficult database expansion and slow query speed, which cannot meet the needs of storage and analysis. To solve the problems, firstly, the basic architecture of distributed big data storage platform is designed by combining the popular virtualization technology and hadoop infrastructure. Secondly, the design of distributed big data storage platform is realized according to the existing big data of water utilities and the actual business database table. Finally, the data migration code from the centralized platform to the distributed platform is completed, and the data migration experiment is carried out. The experimental results verify the feasibility and effectiveness of the design scheme of the distributed big data storage platform, which can provide an ideal distributed solution for the storage and processing of large-scale industrial data.
作者
闫健卓
高凯丽
许红霞
于涌川
YAN Jianzhuo;GAO Kaili;XU Hongxia;YU Yongchuan(Engineering Research Center of Digital Community, Department of Information, Beijing University of Technology,Beijing 100124, China)
出处
《水利信息化》
2019年第3期17-24,共8页
Water Resources Informatization
基金
赛尔下一代互联网技术创新项目(NGII20170207)