Big data is an emerging term in the storage indus- try, and it is data analytics on big storage, i.e., Cloud-scale storage. In Cloud-scale (or EB-scale) file systems, load bal- ancing in request workloads across a m...Big data is an emerging term in the storage indus- try, and it is data analytics on big storage, i.e., Cloud-scale storage. In Cloud-scale (or EB-scale) file systems, load bal- ancing in request workloads across a metadata server cluster is critical for avoiding performance bottlenecks and improv- ing quality of services. Many good approaches have been pro- posed for load balancing in distributed file systems. Some of them pay attention to global namespace balancing, making metadata distribution across metadata servers as uniform as possible. However, they do not work well in skew request dis- tributions, which impair load balancing but simultaneously increase the effectiveness of caching and replication, in this paper, we propose Cloud Cache (C2), an adaptive and scal- able load balancing scheme for metadata server cluster in EB-scale file systems. It combines adaptive cache diffusion and replication scheme to cope with the request load balanc- ing problem, and it can be integrated into existing distributed metadata management approaches to efficiently improve their load balancing performance. C2 runs as follows: 1) to run adaptive cache diffusion first, if a node is overloaded, load- shedding will be used; otherwise, load-stealing will be used; and 2) to run adaptive replication scheme second, if there is a very popular metadata item (or at least two items) causing a node be overloaded, adaptive replication scheme will be used,in which the very popular item is not split into several nodes using adaptive cache diffusion because of its knapsack prop- erty. By conducting performance evaluation in trace-driven simulations, experimental results demonstrate the efficiency and scalability of C2.展开更多
Fault-tolerance is very important in cluster computing and has beenimplemented in many famous cluster-computing systems using checkpoint/restartmechanisms. But existent check-pointing algorithms cannot restore the sta...Fault-tolerance is very important in cluster computing and has beenimplemented in many famous cluster-computing systems using checkpoint/restartmechanisms. But existent check-pointing algorithms cannot restore the states of afile system when roll-backing the running of a program, so there are many restrictionson file accesses in existent fault-tolerance systems. SCR algorithm, an algorithmbased on atomic operation and consistent schedule, which can restore the states offile systems, is presented in this paper. In the SCR algorithm, system calls on filesystems are classified into idem-potent operations and non-idem-potent operations.A non-idem-potent operation modifies a file system's states, while an idem-potentoperation does not. SCR algorithm tracks changes of the file system states. It logseach non-idem-potent operation used by user programs and the information that canrestore the operation in disks. When check-pointing roll-backing the program, SCRalgorithm will revert the file system states to the last checkpoint time. By usingSCR algorithm, users are allowed to use any file operation in their programs.展开更多
IIn order to improve the performance of wireless distributed peer-to-peer(P2P)files sharing systems,a general system architecture and a novel peer selecting model based on fuzzy cognitive maps(FCM)are proposed in this...IIn order to improve the performance of wireless distributed peer-to-peer(P2P)files sharing systems,a general system architecture and a novel peer selecting model based on fuzzy cognitive maps(FCM)are proposed in this paper.The new model provides an effective approach on choosing an optimal peer from several resource discovering results for the best file transfer.Compared with the traditional min-hops scheme that uses hops as the only selecting criterion,the proposed model uses FCM to investigate the complex relationships among various relative factors in wireless environments and gives an overall evaluation score on the candidate.It also has strong scalability for being independent of specified P2P resource discovering protocols.Furthermore,a complete implementation is explained in concrete modules.The simulation results show that the proposed model is effective and feasible compared with min-hops scheme,with the success transfer rate increased by at least 20% and transfer time improved as high as 34%.展开更多
Deep space communication has its own features such as long propagation delays,heavy noise,asymmetric link rates,and intermittent connectivity in space,therefore TCP/IP protocol cannot perform as well as it does in ter...Deep space communication has its own features such as long propagation delays,heavy noise,asymmetric link rates,and intermittent connectivity in space,therefore TCP/IP protocol cannot perform as well as it does in terrestrial communications.Accordingly,the Consultative Committee for Space Data Systems(CCSDS) developed CCSDS File Delivery Protocol(CFDP),which sets standards of efficient file delivery service capable of transferring files to and from mass memory located in the space segment.In CFDP,four optional acknowledge modes are supported to make the communication more reliable.In this paper,we gave a general introduction of typical communication process in CFDP and analysis of its four Negative Acknowledgement(NAK) modes on the respect of file delivery delay and times of retransmission.We found out that despite the shortest file delivery delay,immediate NAK mode suffers from the problem that frequent retransmission may probably lead to network congestion.Thus,we proposed a new mode,the error counter-based NAK mode.By simulation of the case focused on the link between a deep space probe on Mars and a ter-restrial station on Earth,we concluded that error counter-based NAK mode has successfully reduced the retransmission times at negligible cost of certain amount of file delivery delay.展开更多
Integration between file systems and multidatabase systems is a necessary approach to support data sharing from distributed and heterogeneous data sources. We first analyses problems about data integration between fil...Integration between file systems and multidatabase systems is a necessary approach to support data sharing from distributed and heterogeneous data sources. We first analyses problems about data integration between file systems and multidatabase systems. Then, A common data model named XIDM(XML\|based Integrating Dada Model), which is XML oriented, is presented. XIDM bases on a series of XML standards, especially XML Schema, and can well describe semistructured data. So XIDM is powerfully practicable and multipurpose.展开更多
With the advent of new computing paradigms,parallel file systems serve not only traditional scientific computing applications but also non-scientific computing applications,such as financial computing,business,and pub...With the advent of new computing paradigms,parallel file systems serve not only traditional scientific computing applications but also non-scientific computing applications,such as financial computing,business,and public administration.Parallel file systems provide storage services for multiple applications.As a result,various requirements need to be met.However,parallel file systems usually provide a unified storage solution,which cannot meet specific application needs.In this paper,an extended tile handle scheme is proposed to deal with this problem.The original file handle is extended to record I/O optimization information,which allows file systems to specify optimizations for a file or directory based on workload characteristics.Therefore,fine-grained management of I/O optimizations can be achieved.On the basis of the extended file handle scheme,data prefetching and small file optimization mechanisms are proposed for parallel file systems.The experimental results show that the proposed approach improves the aggregate throughput of the overall system by up to 189.75%.展开更多
文摘Big data is an emerging term in the storage indus- try, and it is data analytics on big storage, i.e., Cloud-scale storage. In Cloud-scale (or EB-scale) file systems, load bal- ancing in request workloads across a metadata server cluster is critical for avoiding performance bottlenecks and improv- ing quality of services. Many good approaches have been pro- posed for load balancing in distributed file systems. Some of them pay attention to global namespace balancing, making metadata distribution across metadata servers as uniform as possible. However, they do not work well in skew request dis- tributions, which impair load balancing but simultaneously increase the effectiveness of caching and replication, in this paper, we propose Cloud Cache (C2), an adaptive and scal- able load balancing scheme for metadata server cluster in EB-scale file systems. It combines adaptive cache diffusion and replication scheme to cope with the request load balanc- ing problem, and it can be integrated into existing distributed metadata management approaches to efficiently improve their load balancing performance. C2 runs as follows: 1) to run adaptive cache diffusion first, if a node is overloaded, load- shedding will be used; otherwise, load-stealing will be used; and 2) to run adaptive replication scheme second, if there is a very popular metadata item (or at least two items) causing a node be overloaded, adaptive replication scheme will be used,in which the very popular item is not split into several nodes using adaptive cache diffusion because of its knapsack prop- erty. By conducting performance evaluation in trace-driven simulations, experimental results demonstrate the efficiency and scalability of C2.
文摘Fault-tolerance is very important in cluster computing and has beenimplemented in many famous cluster-computing systems using checkpoint/restartmechanisms. But existent check-pointing algorithms cannot restore the states of afile system when roll-backing the running of a program, so there are many restrictionson file accesses in existent fault-tolerance systems. SCR algorithm, an algorithmbased on atomic operation and consistent schedule, which can restore the states offile systems, is presented in this paper. In the SCR algorithm, system calls on filesystems are classified into idem-potent operations and non-idem-potent operations.A non-idem-potent operation modifies a file system's states, while an idem-potentoperation does not. SCR algorithm tracks changes of the file system states. It logseach non-idem-potent operation used by user programs and the information that canrestore the operation in disks. When check-pointing roll-backing the program, SCRalgorithm will revert the file system states to the last checkpoint time. By usingSCR algorithm, users are allowed to use any file operation in their programs.
基金Sponsored by the National Natural Science Foundation of China(Grant No.60672124 and 60832009)Hi-Tech Research and Development Program(National 863 Program)(Grant No.2007AA01Z221)
文摘IIn order to improve the performance of wireless distributed peer-to-peer(P2P)files sharing systems,a general system architecture and a novel peer selecting model based on fuzzy cognitive maps(FCM)are proposed in this paper.The new model provides an effective approach on choosing an optimal peer from several resource discovering results for the best file transfer.Compared with the traditional min-hops scheme that uses hops as the only selecting criterion,the proposed model uses FCM to investigate the complex relationships among various relative factors in wireless environments and gives an overall evaluation score on the candidate.It also has strong scalability for being independent of specified P2P resource discovering protocols.Furthermore,a complete implementation is explained in concrete modules.The simulation results show that the proposed model is effective and feasible compared with min-hops scheme,with the success transfer rate increased by at least 20% and transfer time improved as high as 34%.
文摘Deep space communication has its own features such as long propagation delays,heavy noise,asymmetric link rates,and intermittent connectivity in space,therefore TCP/IP protocol cannot perform as well as it does in terrestrial communications.Accordingly,the Consultative Committee for Space Data Systems(CCSDS) developed CCSDS File Delivery Protocol(CFDP),which sets standards of efficient file delivery service capable of transferring files to and from mass memory located in the space segment.In CFDP,four optional acknowledge modes are supported to make the communication more reliable.In this paper,we gave a general introduction of typical communication process in CFDP and analysis of its four Negative Acknowledgement(NAK) modes on the respect of file delivery delay and times of retransmission.We found out that despite the shortest file delivery delay,immediate NAK mode suffers from the problem that frequent retransmission may probably lead to network congestion.Thus,we proposed a new mode,the error counter-based NAK mode.By simulation of the case focused on the link between a deep space probe on Mars and a ter-restrial station on Earth,we concluded that error counter-based NAK mode has successfully reduced the retransmission times at negligible cost of certain amount of file delivery delay.
基金Supported by the Beforehand Research for National Defense of China(94J3. 4. 2. JW0 5 15 )
文摘Integration between file systems and multidatabase systems is a necessary approach to support data sharing from distributed and heterogeneous data sources. We first analyses problems about data integration between file systems and multidatabase systems. Then, A common data model named XIDM(XML\|based Integrating Dada Model), which is XML oriented, is presented. XIDM bases on a series of XML standards, especially XML Schema, and can well describe semistructured data. So XIDM is powerfully practicable and multipurpose.
基金supported by the National key R&D Program of China(2018YFB0203901)the National Natural Science Foundation of China(Grant No.61772053)+1 种基金the Science Challenge Project,No.TZ2016002the fund of the State Key Laboratory of Software Development Environment(SKLSDE-2017ZX-10)。
文摘With the advent of new computing paradigms,parallel file systems serve not only traditional scientific computing applications but also non-scientific computing applications,such as financial computing,business,and public administration.Parallel file systems provide storage services for multiple applications.As a result,various requirements need to be met.However,parallel file systems usually provide a unified storage solution,which cannot meet specific application needs.In this paper,an extended tile handle scheme is proposed to deal with this problem.The original file handle is extended to record I/O optimization information,which allows file systems to specify optimizations for a file or directory based on workload characteristics.Therefore,fine-grained management of I/O optimizations can be achieved.On the basis of the extended file handle scheme,data prefetching and small file optimization mechanisms are proposed for parallel file systems.The experimental results show that the proposed approach improves the aggregate throughput of the overall system by up to 189.75%.