摘要
随着科研工作的推进,科研数据出现了海量的增长,PB级科研数据需要高效、稳定的存储系统.传统的数据存储方案存在资源利用率差、集群扩展性能低以及用户界面操作不友好等问题,严重限制了数据在科研场景下的有效利用.依托中科院地球科学大数据专项,本文设计并实现高效的存储系统i-Harbor.该系统以对象存储系统为核心架构,以开源的Ceph分布式存储系统和MongoDB数据库作为对象数据和元数据的存储载体,设计通用的基于HTTP和FTP协议的数据接口,同时利用多副本和纠删码技术消除单点故障,配合Zabbix集群监控系统,实时定位平台参数以及故障,提高平台容灾性和安全性.此外,基于底层分布式结构的特点,集群可以随意添加存储节点,提高了平台的扩展性.
With the development of scientific research,there is a massive increase in scientific research data.PB-level scientific research data requires efficient and stable storage systems.The traditional data storage scheme has problems such as poor resource utilization,low cluster expansion performance,and unfriendly user interface operation,which seriously limit the effective use of data.Relying on the Big Data Project of the Chinese Academy of Sciences,we design and implement an efficient storage system i-Harbor.Its core architecture is based on object storage system,using opensourced Ceph distributed system and MongoDB database as the storage carrier of object data and metadata.The data interface is designed on the basis of HTTP API and FTP.To improve the platform disaster tolerance and security,we use Multiple Copies and Erasure Coding technology to eliminate single node of failure.Meanwhile we locate the real-time platform parameters and faults by Zabbix cluster monitoring system.Based on the distribution characteristics,the cluster can add storage nodes at will,which improves the platform’s scalability.
作者
王锦涛
张海明
WANG Jin-Tao;ZHANG Hai-Ming((Computer Network Information Center,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100049,China)
出处
《计算机系统应用》
2020年第7期82-88,共7页
Computer Systems & Applications
基金
中国科学院A类战略性先导科技专项(XDA19000000)。
关键词
对象存储
分布式
Ceph
object storage
distributed storage
Ceph