To accommodate the explosively increasing amount of data in many areas such as scientific computing and e-Business, physical storage devices and control components have been separated from traditional computing system...To accommodate the explosively increasing amount of data in many areas such as scientific computing and e-Business, physical storage devices and control components have been separated from traditional computing systems to become a; scalable, intelligent storage subsystem that, when appropriately designed, should provide transparent storage interface, effective data allocation, flexible and efficient storage management, and other impressive features. The design goals and desirable features of such a storage subsystem include high performance, high scalability, high availability, high reliability and high security. Extensive research has been conducted in this field by researchers all over the world, yet many issues still remain open and challenging. This paper studies five different online massive storage systems and one offiine storage system that we have developed with the research grant support from China. The storage pool with multiple network-attached RAIDs avoids expensive store-and-forward data copying between the server and storage system, improving data transfer rate by a factor of 2 3 over a traditional disk array. Two types of high performance distributed storage systems for local-area network storage are introduced in the paper. One of them is the Virtual Interface Storage Architecture (VISA) where VI as a communication protocol replaces the TCP/IP protocol in the system. VISA's performance is shown to achieve better than that of IP SAN by designing and implementing the vSCSI (VI-attached SCSI) protocol to support SCSI commands in the VI network. The other is a fault-tolerant parallel virtual file system that is designed and implemented to provide high I/O performance and high reliability. A global distributed storage system for wide-area network storage is discussed in detail in the paper, where a Storage Service Provider is added to provide storage service and plays the role of user agent for the storage system. Object based Storage Systems not only store data but also adopt the attributes and methods of objects that encapsulate the data. The adaptive policy triggering mechanism (APTM), which borrows proven machine learning techniques to improve the scalability of object storage systems, is the embodiment of the idea about smart storage device and facilitates the self-management of massive storage systems. A typical offiine massive storage system is used to backup data or store documents, for which the tape virtualization technology is discussed. Finally, a domain-based storage management framework for different types of storage systems is presented in the paper.展开更多
This paper designs and develops a framework on a distributed computing platform for massive multi-source spatial data using a column-oriented database(HBase).This platform consists of four layers including ETL(extract...This paper designs and develops a framework on a distributed computing platform for massive multi-source spatial data using a column-oriented database(HBase).This platform consists of four layers including ETL(extraction transformation loading) tier,data processing tier,data storage tier and data display tier,achieving long-term store,real-time analysis and inquiry for massive data.Finally,a real dataset cluster is simulated,which are made up of 39 nodes including 2 master nodes and 37 data nodes,and performing function tests of data importing module and real-time query module,and performance tests of HDFS's I/O,the MapReduce cluster,batch-loading and real-time query of massive data.The test results indicate that this platform achieves high performance in terms of response time and linear scalability.展开更多
基金T This paper is supported by the National Natural Science Foundation of China under Grants No.60125208, No.60273074, No.60303032, No.69973017, and the National Grand Fundamental Research 973 Program of China under Grants No.2004CB318201, No. 2003CB317003.
文摘To accommodate the explosively increasing amount of data in many areas such as scientific computing and e-Business, physical storage devices and control components have been separated from traditional computing systems to become a; scalable, intelligent storage subsystem that, when appropriately designed, should provide transparent storage interface, effective data allocation, flexible and efficient storage management, and other impressive features. The design goals and desirable features of such a storage subsystem include high performance, high scalability, high availability, high reliability and high security. Extensive research has been conducted in this field by researchers all over the world, yet many issues still remain open and challenging. This paper studies five different online massive storage systems and one offiine storage system that we have developed with the research grant support from China. The storage pool with multiple network-attached RAIDs avoids expensive store-and-forward data copying between the server and storage system, improving data transfer rate by a factor of 2 3 over a traditional disk array. Two types of high performance distributed storage systems for local-area network storage are introduced in the paper. One of them is the Virtual Interface Storage Architecture (VISA) where VI as a communication protocol replaces the TCP/IP protocol in the system. VISA's performance is shown to achieve better than that of IP SAN by designing and implementing the vSCSI (VI-attached SCSI) protocol to support SCSI commands in the VI network. The other is a fault-tolerant parallel virtual file system that is designed and implemented to provide high I/O performance and high reliability. A global distributed storage system for wide-area network storage is discussed in detail in the paper, where a Storage Service Provider is added to provide storage service and plays the role of user agent for the storage system. Object based Storage Systems not only store data but also adopt the attributes and methods of objects that encapsulate the data. The adaptive policy triggering mechanism (APTM), which borrows proven machine learning techniques to improve the scalability of object storage systems, is the embodiment of the idea about smart storage device and facilitates the self-management of massive storage systems. A typical offiine massive storage system is used to backup data or store documents, for which the tape virtualization technology is discussed. Finally, a domain-based storage management framework for different types of storage systems is presented in the paper.
基金Supported by the National Science and Technology Support Project(No.2012BAH01F02)from Ministry of Science and Technology of Chinathe Director Fund(No.IS201116002)from Institute of Seismology,CEA
文摘This paper designs and develops a framework on a distributed computing platform for massive multi-source spatial data using a column-oriented database(HBase).This platform consists of four layers including ETL(extraction transformation loading) tier,data processing tier,data storage tier and data display tier,achieving long-term store,real-time analysis and inquiry for massive data.Finally,a real dataset cluster is simulated,which are made up of 39 nodes including 2 master nodes and 37 data nodes,and performing function tests of data importing module and real-time query module,and performance tests of HDFS's I/O,the MapReduce cluster,batch-loading and real-time query of massive data.The test results indicate that this platform achieves high performance in terms of response time and linear scalability.