Application launch performance is of great importance to system platform developers and vendors as it greatly affects the degree of users' satisfaction. The single most effective way to improve application launch per...Application launch performance is of great importance to system platform developers and vendors as it greatly affects the degree of users' satisfaction. The single most effective way to improve application launch performance is to replace a hard disk drive (HDD) with a solid state drive (SSD), which has recently become affordable and popular. A natural question is then whether or not to replace the traditional HDD-aware application launchers with a new SSD-aware optimizer. We address this question by analyzing the inefficiency of the HDD-aware application launchers on SSDs and then proposing a new SSD-aware application prefetching scheme, called the Fast Application STarter (FAST). The key idea of FAST is to overlap the computation (CPU) time with the SSD access (I/O) time during an application launch. FAST is composed of a set of user-level components and system debugging tools provided by Linux OS (operating system). Hence, FAST can be easily deployed in any recent Linux versions without kernel recompilation. We implement FAST on a desktop PC with an SSD running Linux 2.6.32 OS and evaluate it by launching a set of widely-used applications, demonstrating an average of 28% reduction of application launch time as compared to PC without a prefetcher.展开更多
Emerging non-volatile memory technologies,especially flash-based solid state drives(SSDs),have increasingly been adopted in the storage stack.They provide numerous advantages over traditional mechanically rotating har...Emerging non-volatile memory technologies,especially flash-based solid state drives(SSDs),have increasingly been adopted in the storage stack.They provide numerous advantages over traditional mechanically rotating hard disk drives(HDDs)and have a tendency to replace HDDs.Due to the long existence of HDDs as primary building blocks for storage systems,however,much of the system software has been specially designed for HDD and may not be optimal for non-volatile memory media.Therefore,in order to realistically leverage its superior raw performance to the maximum,the existing upper layer software has to be re-evaluated or re-designed.To this end,in this paper,we propose PASS,an optimized I/O scheduler at the Linux block layer to accommodate the changing trend of underlying storage devices toward flash-based SSDs.PASS takes the rich internal parallelism in SSDs into account when dispatching requests to the device driver in order to achieve high performance.Specifically,it partitions the logical storage space into fixed-size regions(preferably the component package sizes)as scheduling units.These scheduling units are serviced in a round-robin manner and for every chance that the chosen dispatching unit issues only a batch of either read or write requests to suppress the excessive mutual interference.Additionally,the requests are sorted according to their visiting addresses while waiting in the dispatching queues to exploit high sequential performance of SSD.The experimental results with a variety of workloads have shown that PASS outperforms the four Linux off-the-shelf I/O schedulers by a degree of 3%up to41%,while at the same time it improves the lifetime significantly,due to reducing the internal write amplification.展开更多
Data layout in a file system is the organization of data stored in external storages. The data layout has a huge impact on performance of storage systems. We survey three main kinds of data layout in traditional file ...Data layout in a file system is the organization of data stored in external storages. The data layout has a huge impact on performance of storage systems. We survey three main kinds of data layout in traditional file systems: in-place update file system, log-structured file system, and copy-on-write file sys- tem. Each file system has its own strengths and weaknesses under different circumstances. We also include a recent us- age of persistent layout in a file system that combines both flash memory and byte- addressable non- volatile memory. With this survey, we conclude that persistent data layout in file systems may evolve dramatically in the era of emerging non-volatile memory.展开更多
Flash memory has limited erasure/program cycles. Hence, to meet their advertised capacity all the time, flash-based solid state drives (SSDs) must prolong their life span through a wear-leveling mechanism. As a very...Flash memory has limited erasure/program cycles. Hence, to meet their advertised capacity all the time, flash-based solid state drives (SSDs) must prolong their life span through a wear-leveling mechanism. As a very important part of flash translation layer (FTL), wear leveling is usually implemented in SSD controllers, which is called internal wear leveling. However, there is no wear leveling among SSDs in SSD-based redundant array of independent disks (RAIDs) systems, making some SSDs wear out faster than others. Once an SSD fails, reconstruction must be triggered immediately, but the cost of this process is so high that both system reliability and availability are affected seriously. We therefore propose cross-SSD wear leveling (CSWL) to enhance the endurance of entire SSD-based RAID systems. Under the workload of random access pattern, parity stripes suffer from much more updates because updating to a data stripe will cause the modification of other all related parity stripes. Based on this principle, we introduce an age-driven parity distribution scheme to guarantee wear leveling among flash SSDs and thereby prolong the endurance of RAID systems. Furthermore, age-driven lc,arity distribution benefits performance by maintaining better load balance the life span and performance of SSD-based RAID. With insignificant overhead, CSWL can significantly improve both展开更多
Deduplication has been commonly used in both enterprise storage systems and cloud storage. To overcome the performance challenge for the selective restore operations of deduplication systems, solid-state-drive-based ...Deduplication has been commonly used in both enterprise storage systems and cloud storage. To overcome the performance challenge for the selective restore operations of deduplication systems, solid-state-drive-based (i.e., SSD-based) re^d cache cm, be deployed for speeding up by caching popular restore contents dynamically. Unfortunately, frequent data updates induced by classical cache schemes (e.g., LRU and LFU) significantly shorten SSDs' lifetime while slowing down I/O processes in SSDs. To address this problem, we propose a new solution -- LOP-Cache to greatly improve tile write durability of SSDs as well as I/O performance by enlarging the proportion of long-term popular (LOP) data among data written into SSD-based cache. LOP-Cache keeps LOP data in the SSD cache for a long time period to decrease the number of cache replacements. Furthermore, it prevents unpopular or unnecessary data in deduplication containers from being written into the SSD cache. We implemented LOP-Cache in a prototype deduplication system to evaluate its pertbrmance. Our experimental results indicate that LOP-Cache shortens the latency of selective restore by an average of 37.3% at the cost of a small SSD-based cache with only 5.56% capacity of the deduplicated data. Importantly, LOP-Cache improves SSDs' lifetime by a factor of 9.77. The evidence shows that LOP-Cache offers a cost-efficient SSD-based read cache solution to boost performance of selective restore for deduplication systems.展开更多
The limited lifespan is the Achilles' heel of solid state drives (SSDs) based on NAND flash.. NAND flash has two drawbacks that degrade SSDs' lifespan. One is the out-of-place update. Another is the sequential wri...The limited lifespan is the Achilles' heel of solid state drives (SSDs) based on NAND flash.. NAND flash has two drawbacks that degrade SSDs' lifespan. One is the out-of-place update. Another is the sequential write constraint within a block. SSDs usually employ write buffer to extend their lifetime. However, existing write buffer schemes only pay attention to the first drawback, while neglect the second one. We propose a hetero-buffer architecture covering both aspects simultaneously. The hetero-buffer consists of two components, dynamic random access memory (DRAM) and the reorder area. DRAM endeavors to reduce write traffic as much as possible by pursuing a higher hit ratio (overcome the first drawback). The reorder area focuses on reordering write sequence (overcome the second drawback). Our hetero-buffer outperforms traditional write buffers because of two reasons. First, the DRAM can adopt existing superior cache replacement policy, thus achieves higher hit ratio. Second, the hetero-buffer reorders the write sequence, which has not been exploited by traditional write buffers. Besides the optimizations mentioned above, our hetero-buffer considers the work environment of write buffer, which is also neglected by traditional write buffers. By this way, the hetero-buffer is further improved. The performance is evaluated via trace-driven simulations. Experimental results show that, SSDs employing the hetero-buffer survive longer lifespan on most workloads.展开更多
基金supported by RP-Grant 2010 of Ewha Womans University
文摘Application launch performance is of great importance to system platform developers and vendors as it greatly affects the degree of users' satisfaction. The single most effective way to improve application launch performance is to replace a hard disk drive (HDD) with a solid state drive (SSD), which has recently become affordable and popular. A natural question is then whether or not to replace the traditional HDD-aware application launchers with a new SSD-aware optimizer. We address this question by analyzing the inefficiency of the HDD-aware application launchers on SSDs and then proposing a new SSD-aware application prefetching scheme, called the Fast Application STarter (FAST). The key idea of FAST is to overlap the computation (CPU) time with the SSD access (I/O) time during an application launch. FAST is composed of a set of user-level components and system debugging tools provided by Linux OS (operating system). Hence, FAST can be easily deployed in any recent Linux versions without kernel recompilation. We implement FAST on a desktop PC with an SSD running Linux 2.6.32 OS and evaluate it by launching a set of widely-used applications, demonstrating an average of 28% reduction of application launch time as compared to PC without a prefetcher.
基金supported by the National Basic Research Program(973)of China(No.2004CB318203) the National High-Tech R&D Program(863)of China(No.2009AA01A402)+1 种基金the Natural Science Foundation of Hubei Province,China(No.2013CFB035)the Key Science Research Project of Hubei Education Office in China(No.D20141301)
文摘Emerging non-volatile memory technologies,especially flash-based solid state drives(SSDs),have increasingly been adopted in the storage stack.They provide numerous advantages over traditional mechanically rotating hard disk drives(HDDs)and have a tendency to replace HDDs.Due to the long existence of HDDs as primary building blocks for storage systems,however,much of the system software has been specially designed for HDD and may not be optimal for non-volatile memory media.Therefore,in order to realistically leverage its superior raw performance to the maximum,the existing upper layer software has to be re-evaluated or re-designed.To this end,in this paper,we propose PASS,an optimized I/O scheduler at the Linux block layer to accommodate the changing trend of underlying storage devices toward flash-based SSDs.PASS takes the rich internal parallelism in SSDs into account when dispatching requests to the device driver in order to achieve high performance.Specifically,it partitions the logical storage space into fixed-size regions(preferably the component package sizes)as scheduling units.These scheduling units are serviced in a round-robin manner and for every chance that the chosen dispatching unit issues only a batch of either read or write requests to suppress the excessive mutual interference.Additionally,the requests are sorted according to their visiting addresses while waiting in the dispatching queues to exploit high sequential performance of SSD.The experimental results with a variety of workloads have shown that PASS outperforms the four Linux off-the-shelf I/O schedulers by a degree of 3%up to41%,while at the same time it improves the lifetime significantly,due to reducing the internal write amplification.
基金supported by ZTE Industry-Academia-Research Cooperation Funds
文摘Data layout in a file system is the organization of data stored in external storages. The data layout has a huge impact on performance of storage systems. We survey three main kinds of data layout in traditional file systems: in-place update file system, log-structured file system, and copy-on-write file sys- tem. Each file system has its own strengths and weaknesses under different circumstances. We also include a recent us- age of persistent layout in a file system that combines both flash memory and byte- addressable non- volatile memory. With this survey, we conclude that persistent data layout in file systems may evolve dramatically in the era of emerging non-volatile memory.
基金Supported by the National High Technology Research and Development 863 Program of China under Grant No.2013AA013201the National Natural Science Foundation of China under Grant Nos.61025009,61232003,61120106005,61170288
文摘Flash memory has limited erasure/program cycles. Hence, to meet their advertised capacity all the time, flash-based solid state drives (SSDs) must prolong their life span through a wear-leveling mechanism. As a very important part of flash translation layer (FTL), wear leveling is usually implemented in SSD controllers, which is called internal wear leveling. However, there is no wear leveling among SSDs in SSD-based redundant array of independent disks (RAIDs) systems, making some SSDs wear out faster than others. Once an SSD fails, reconstruction must be triggered immediately, but the cost of this process is so high that both system reliability and availability are affected seriously. We therefore propose cross-SSD wear leveling (CSWL) to enhance the endurance of entire SSD-based RAID systems. Under the workload of random access pattern, parity stripes suffer from much more updates because updating to a data stripe will cause the modification of other all related parity stripes. Based on this principle, we introduce an age-driven parity distribution scheme to guarantee wear leveling among flash SSDs and thereby prolong the endurance of RAID systems. Furthermore, age-driven lc,arity distribution benefits performance by maintaining better load balance the life span and performance of SSD-based RAID. With insignificant overhead, CSWL can significantly improve both
基金This work is supported by the Natural Science Foundation of Beijing under Grant No. 4172031, the Pundamental Research FSmds for the Central Universities of China, and the Research Funds of Renmin University of China under Grant No. 16XNLQ02. Xiao Qin's work is supported by the U.S. National Science Foundation under Grant Nos. IIS-1618669, CCF-0845257 (CAREER), CNS-0917137, CNS-0757778, CCF-0742187, CNS-0831502, CNS-0855251, and OCI-0753305. Xiao Qin's study is also supported by the Programme of Introducing Talents of Discipline to Universities (111 Project) in China under Grant No. B07038.
文摘Deduplication has been commonly used in both enterprise storage systems and cloud storage. To overcome the performance challenge for the selective restore operations of deduplication systems, solid-state-drive-based (i.e., SSD-based) re^d cache cm, be deployed for speeding up by caching popular restore contents dynamically. Unfortunately, frequent data updates induced by classical cache schemes (e.g., LRU and LFU) significantly shorten SSDs' lifetime while slowing down I/O processes in SSDs. To address this problem, we propose a new solution -- LOP-Cache to greatly improve tile write durability of SSDs as well as I/O performance by enlarging the proportion of long-term popular (LOP) data among data written into SSD-based cache. LOP-Cache keeps LOP data in the SSD cache for a long time period to decrease the number of cache replacements. Furthermore, it prevents unpopular or unnecessary data in deduplication containers from being written into the SSD cache. We implemented LOP-Cache in a prototype deduplication system to evaluate its pertbrmance. Our experimental results indicate that LOP-Cache shortens the latency of selective restore by an average of 37.3% at the cost of a small SSD-based cache with only 5.56% capacity of the deduplicated data. Importantly, LOP-Cache improves SSDs' lifetime by a factor of 9.77. The evidence shows that LOP-Cache offers a cost-efficient SSD-based read cache solution to boost performance of selective restore for deduplication systems.
基金Supported by the National High Technology Research and Development 863 Program of China under Grant No.2013AA013201the National Natural Science Foundation of China under Grant Nos.61025009,61232003,61120106005,61170288
文摘The limited lifespan is the Achilles' heel of solid state drives (SSDs) based on NAND flash.. NAND flash has two drawbacks that degrade SSDs' lifespan. One is the out-of-place update. Another is the sequential write constraint within a block. SSDs usually employ write buffer to extend their lifetime. However, existing write buffer schemes only pay attention to the first drawback, while neglect the second one. We propose a hetero-buffer architecture covering both aspects simultaneously. The hetero-buffer consists of two components, dynamic random access memory (DRAM) and the reorder area. DRAM endeavors to reduce write traffic as much as possible by pursuing a higher hit ratio (overcome the first drawback). The reorder area focuses on reordering write sequence (overcome the second drawback). Our hetero-buffer outperforms traditional write buffers because of two reasons. First, the DRAM can adopt existing superior cache replacement policy, thus achieves higher hit ratio. Second, the hetero-buffer reorders the write sequence, which has not been exploited by traditional write buffers. Besides the optimizations mentioned above, our hetero-buffer considers the work environment of write buffer, which is also neglected by traditional write buffers. By this way, the hetero-buffer is further improved. The performance is evaluated via trace-driven simulations. Experimental results show that, SSDs employing the hetero-buffer survive longer lifespan on most workloads.