The current storage mechanism considered little in data’s keeping characteristics.These can produce fragments of various sizes among data sets.In some cases,these fragments may be serious and harm system performance....The current storage mechanism considered little in data’s keeping characteristics.These can produce fragments of various sizes among data sets.In some cases,these fragments may be serious and harm system performance.In this paper,we manage to modify the current storage mechanism.We introduce an extra storage unit called data bucket into the classical data manage architecture.Next,we modify the data manage mechanism to improve our designs.By keeping data according to their visited information,both the number of fragments and the fragment size are greatly reduced.Considering different data features and storage device conditions,we also improve the solid state drive(SSD)lifetime by keeping data into different spaces.Experiments show that our designs have a positive influence on the SSD storage density and actual service time.展开更多
固态盘(solid state drive,SSD)因为其优越的性能已被大量部署于当前的存储系统中.但是,由于寿命有限,SSD的可靠性受到广泛的质疑.磁盘阵列(redundant arrays of inexpensive disk,RAID)是一种传统的用来提高可靠性的手段,但并不适用于S...固态盘(solid state drive,SSD)因为其优越的性能已被大量部署于当前的存储系统中.但是,由于寿命有限,SSD的可靠性受到广泛的质疑.磁盘阵列(redundant arrays of inexpensive disk,RAID)是一种传统的用来提高可靠性的手段,但并不适用于SSD.这项工作提出一种基于SSD和磁盘的混合存储系统,构建该系统的主要思想是SSD响应所有I/O请求,从而获得较高的性能;磁盘备份所有数据,从而保证系统的可靠性.但是,磁盘的I/O性能显著低于SSD,构建该系统的问题在于磁盘能否及时地备份SSD上的数据.为了解决这一问题,从两方面提出优化:在延迟方面,采用非易失主存弥补磁盘与SSD的延迟差距;在带宽方面,采用两种措施:1)在单块磁盘内部重组I/O请求,使磁盘尽可能的顺序读写;2)采用多块磁盘备份多块SSD,通过将一块SSD上的写请求分散到多块磁盘上,有效应对单块SSD上出现的突发写请求.通过原型系统实现表明,该混合系统是可行的:磁盘能够为SSD提供实时的数据备份;与其他系统相比,该混合系统取得较高的性价比.展开更多
以SSD(solid state drive)为代表的新型存储介质在虚拟化环境下得到了广泛的应用,通常作为虚拟机读写缓存,起到优化磁盘I/O性能的作用.已有研究往往关注SSD缓存的容量规划,依据缓存读写命中率评价SSD缓存分配效果,未能充分考虑SSD的服...以SSD(solid state drive)为代表的新型存储介质在虚拟化环境下得到了广泛的应用,通常作为虚拟机读写缓存,起到优化磁盘I/O性能的作用.已有研究往往关注SSD缓存的容量规划,依据缓存读写命中率评价SSD缓存分配效果,未能充分考虑SSD的服务能力上限,难以适用于典型的分布式应用场景,存在虚拟机抢占SSD缓存资源,导致虚拟机中应用性能违约的可能.实现了虚拟化环境下面向多目标优化的自适应SSD缓存系统,考虑了SSD的服务能力上限.基于自适应闭环实现对虚拟机和应用状态的动态感知.动态检测局部SSD缓存抢占状态,基于聚类方法生成虚拟机的优化放置方案,依据全局SSD缓存供给能力确定虚拟机迁移顺序和时机.实验结果表明,该方法在应对典型分布式应用场景时可以有效缓解SSD缓存资源的争用,同时满足应用对虚拟机放置的需求,提升应用的性能并兼顾应用的可靠性.在Hadoop应用场景下,平均降低了25%的任务执行时间,对I/O密集型应用平均提升39%的吞吐率.在Zoo Keeper应用场景下,以不到5%的性能损失为代价,应对了虚拟化主机的单点失效带来的虚拟机宕机问题.展开更多
随着大数据时代的到来,固态硬盘已经逐渐在大型数据中心得到应用。作为使用最广泛的RAID技术,RAID5也开始应用于固态硬盘阵列,以保证数据的可靠性。然而,RAID5中校验信息需要频繁地更新,尤其在随机访问中,频繁地更新校验信息将会对固态...随着大数据时代的到来,固态硬盘已经逐渐在大型数据中心得到应用。作为使用最广泛的RAID技术,RAID5也开始应用于固态硬盘阵列,以保证数据的可靠性。然而,RAID5中校验信息需要频繁地更新,尤其在随机访问中,频繁地更新校验信息将会对固态硬盘阵列的性能和寿命造成很大的影响,针对此问题,提出PA-SSD(Parity-Aware Solid State Disk)控制器设计,从RAID5控制器得到校验信息的逻辑地址,在SSD控制器中设置一个缓存Pcache,暂存更新后的校验信息,并在SSD中将数据和校验分开布局,设置专门的区域存放校验信息。通过实验仿真测试,提出的方法能有效地减少校验信息对SSD的写操作,并且减少了SSD的擦除次数,提升了SSD阵列的性能和寿命。展开更多
作为SSD(solid state drives)的存储元件,NAND闪存在进行写之前,存储单元必须先进行擦除,因此被称作写一次存储器。SSD的使用寿命受到存储单元的擦除次数的限制,因此减少擦除次数对于SSD的可靠性十分重要。提出了一种通过编码压缩后的...作为SSD(solid state drives)的存储元件,NAND闪存在进行写之前,存储单元必须先进行擦除,因此被称作写一次存储器。SSD的使用寿命受到存储单元的擦除次数的限制,因此减少擦除次数对于SSD的可靠性十分重要。提出了一种通过编码压缩后的差值信息的方法来对SSD中写过一次的页面进行二次写,从而减少SSD的擦除次数,延长使用寿命。首先计算物理页面中更新前后的数据的差值,然后将差值数据进行压缩,再将压缩后的数据进行编码后保存在写过的物理页中的可写位中,以此实现写过物理页的二次写。实验结果表明,对于数据更新为主的应用,该方法能够充分利用写过的物理页中的可写位,大幅减少SSD的擦除次数。展开更多
SSD(solid state drive)的写入寿命比较有限,因此除命中率外,SSD缓存设备的写入量成为评价缓存替换算法的另一个关键指标。如何使算法提高写入数据转化为缓存命中的效率,从而延长SSD的使用寿命,具有重要的研究意义。目前,已有缓存替换...SSD(solid state drive)的写入寿命比较有限,因此除命中率外,SSD缓存设备的写入量成为评价缓存替换算法的另一个关键指标。如何使算法提高写入数据转化为缓存命中的效率,从而延长SSD的使用寿命,具有重要的研究意义。目前,已有缓存替换算法的设计一般基于时间局部性,即刚被访问的数据短期内被访问的概率较高,因此需要频繁的数据更新和较高写入量来保证较高命中率;或是通过不低的开销屏蔽相对最差的部分数据来减少一定的写入量,还缺少用低开销获得数据长期热度规律,有效提高缓存数据质量的算法。提出了访问序列折叠的缓存替换算法,用比较低的开销定位拥有长期稳定热度的数据写入缓存,明显提高了SSD缓存数据质量,在保证命中率的同时减少了SSD的写入量。实验表明,访问序列折叠算法相比LRU(least recently used)算法可在命中率损失低于10%的情况下减少90%的写入量,与SieveStore、L2ARC(level2 adjustable replacement cache)等写入优化缓存算法相比,命中率相当时可将写入量减少50%以上,有效达到了通过缓存高质量数据,减少SSD的写入量,延长其使用寿命的目的。展开更多
Performance and energy consumption of a solid state disk(SSD) highly depend on file systems and I/O schedulers in operating systems. To find an optimal combination of a file system and an I/O scheduler for SSDs, we us...Performance and energy consumption of a solid state disk(SSD) highly depend on file systems and I/O schedulers in operating systems. To find an optimal combination of a file system and an I/O scheduler for SSDs, we use a metric called the aggregative indicator(AI), which is the ratio of SSD performance value(e.g., data transfer rate in MB/s or throughput in IOPS) to that of energy consumption for an SSD. This metric aims to evaluate SSD performance per energy consumption and to study the SSD which delivers high performance at low energy consumption in a combination of a file system and an I/O scheduler. We also propose a metric called Cemp to study the changes of energy consumption and mean performance for an Intel SSD(SSD-I) when it provides the largest AI, lowest power, and highest performance, respectively. Using Cemp, we attempt to find the combination of a file system and an I/O scheduler to make SSD-I deliver a smooth change in energy consumption. We employ Filebench as a workload generator to simulate a wide range of workloads(i.e., varmail, fileserver, and webserver), and explore optimal combinations of file systems and I/O schedulers(i.e., optimal values of AI) for tested SSDs under different workloads. Experimental results reveal that the proposed aggregative indicator is comprehensive for exploring the optimal combination of a file system and an I/O scheduler for SSDs, compared with an individual metric.展开更多
In this paper, we propose a fast and simple system emulator, called a system performance emulator(SPE), to evaluate long read operations.The SPE estimates how much system-wide performance is enhanced by using a faster...In this paper, we propose a fast and simple system emulator, called a system performance emulator(SPE), to evaluate long read operations.The SPE estimates how much system-wide performance is enhanced by using a faster solid state disk(SSD).By suspending a CPU for a certain time during direct memory access(DMA) transfer and subtracting this suspended time from the total DMA time, the SPE estimates the improvement in system performance expected from an enhanced SSD prior to its manufacture.We also examine the relation between storage performance and system performance using the SPE.展开更多
Data layout in a file system is the organization of data stored in external storages. The data layout has a huge impact on performance of storage systems. We survey three main kinds of data layout in traditional file ...Data layout in a file system is the organization of data stored in external storages. The data layout has a huge impact on performance of storage systems. We survey three main kinds of data layout in traditional file systems: in-place update file system, log-structured file system, and copy-on-write file sys- tem. Each file system has its own strengths and weaknesses under different circumstances. We also include a recent us- age of persistent layout in a file system that combines both flash memory and byte- addressable non- volatile memory. With this survey, we conclude that persistent data layout in file systems may evolve dramatically in the era of emerging non-volatile memory.展开更多
基金partly supported by the National Natural Science Foundation of China under Grant No.62072076the Research Fund of National Key Laboratory of Computer Architecture under Grant No.CARCH201811。
文摘The current storage mechanism considered little in data’s keeping characteristics.These can produce fragments of various sizes among data sets.In some cases,these fragments may be serious and harm system performance.In this paper,we manage to modify the current storage mechanism.We introduce an extra storage unit called data bucket into the classical data manage architecture.Next,we modify the data manage mechanism to improve our designs.By keeping data according to their visited information,both the number of fragments and the fragment size are greatly reduced.Considering different data features and storage device conditions,we also improve the solid state drive(SSD)lifetime by keeping data into different spaces.Experiments show that our designs have a positive influence on the SSD storage density and actual service time.
文摘固态盘(solid state drive,SSD)因为其优越的性能已被大量部署于当前的存储系统中.但是,由于寿命有限,SSD的可靠性受到广泛的质疑.磁盘阵列(redundant arrays of inexpensive disk,RAID)是一种传统的用来提高可靠性的手段,但并不适用于SSD.这项工作提出一种基于SSD和磁盘的混合存储系统,构建该系统的主要思想是SSD响应所有I/O请求,从而获得较高的性能;磁盘备份所有数据,从而保证系统的可靠性.但是,磁盘的I/O性能显著低于SSD,构建该系统的问题在于磁盘能否及时地备份SSD上的数据.为了解决这一问题,从两方面提出优化:在延迟方面,采用非易失主存弥补磁盘与SSD的延迟差距;在带宽方面,采用两种措施:1)在单块磁盘内部重组I/O请求,使磁盘尽可能的顺序读写;2)采用多块磁盘备份多块SSD,通过将一块SSD上的写请求分散到多块磁盘上,有效应对单块SSD上出现的突发写请求.通过原型系统实现表明,该混合系统是可行的:磁盘能够为SSD提供实时的数据备份;与其他系统相比,该混合系统取得较高的性价比.
文摘以SSD(solid state drive)为代表的新型存储介质在虚拟化环境下得到了广泛的应用,通常作为虚拟机读写缓存,起到优化磁盘I/O性能的作用.已有研究往往关注SSD缓存的容量规划,依据缓存读写命中率评价SSD缓存分配效果,未能充分考虑SSD的服务能力上限,难以适用于典型的分布式应用场景,存在虚拟机抢占SSD缓存资源,导致虚拟机中应用性能违约的可能.实现了虚拟化环境下面向多目标优化的自适应SSD缓存系统,考虑了SSD的服务能力上限.基于自适应闭环实现对虚拟机和应用状态的动态感知.动态检测局部SSD缓存抢占状态,基于聚类方法生成虚拟机的优化放置方案,依据全局SSD缓存供给能力确定虚拟机迁移顺序和时机.实验结果表明,该方法在应对典型分布式应用场景时可以有效缓解SSD缓存资源的争用,同时满足应用对虚拟机放置的需求,提升应用的性能并兼顾应用的可靠性.在Hadoop应用场景下,平均降低了25%的任务执行时间,对I/O密集型应用平均提升39%的吞吐率.在Zoo Keeper应用场景下,以不到5%的性能损失为代价,应对了虚拟化主机的单点失效带来的虚拟机宕机问题.
文摘随着大数据时代的到来,固态硬盘已经逐渐在大型数据中心得到应用。作为使用最广泛的RAID技术,RAID5也开始应用于固态硬盘阵列,以保证数据的可靠性。然而,RAID5中校验信息需要频繁地更新,尤其在随机访问中,频繁地更新校验信息将会对固态硬盘阵列的性能和寿命造成很大的影响,针对此问题,提出PA-SSD(Parity-Aware Solid State Disk)控制器设计,从RAID5控制器得到校验信息的逻辑地址,在SSD控制器中设置一个缓存Pcache,暂存更新后的校验信息,并在SSD中将数据和校验分开布局,设置专门的区域存放校验信息。通过实验仿真测试,提出的方法能有效地减少校验信息对SSD的写操作,并且减少了SSD的擦除次数,提升了SSD阵列的性能和寿命。
文摘作为SSD(solid state drives)的存储元件,NAND闪存在进行写之前,存储单元必须先进行擦除,因此被称作写一次存储器。SSD的使用寿命受到存储单元的擦除次数的限制,因此减少擦除次数对于SSD的可靠性十分重要。提出了一种通过编码压缩后的差值信息的方法来对SSD中写过一次的页面进行二次写,从而减少SSD的擦除次数,延长使用寿命。首先计算物理页面中更新前后的数据的差值,然后将差值数据进行压缩,再将压缩后的数据进行编码后保存在写过的物理页中的可写位中,以此实现写过物理页的二次写。实验结果表明,对于数据更新为主的应用,该方法能够充分利用写过的物理页中的可写位,大幅减少SSD的擦除次数。
基金supported by the National Basic Research Program(973)of China(No.2011CB302303)the National Natural Science Foundation of China(No.60933002)+1 种基金the National High-Tech R&D Program(863)of China(No.2013AA013203)the U.S. National Science Foundation under Grants CCF0845257(CAREER),CNS-0917137(CSR),CNS-0757778(CSR),CCF-0742187(CPA),CNS-0831502(CyberTrust),CNS-0855251(CRI),OCI-0753305(CI-TEAM),DUE-0837341(CCLI),and DUE-0830831(SFS)
文摘Performance and energy consumption of a solid state disk(SSD) highly depend on file systems and I/O schedulers in operating systems. To find an optimal combination of a file system and an I/O scheduler for SSDs, we use a metric called the aggregative indicator(AI), which is the ratio of SSD performance value(e.g., data transfer rate in MB/s or throughput in IOPS) to that of energy consumption for an SSD. This metric aims to evaluate SSD performance per energy consumption and to study the SSD which delivers high performance at low energy consumption in a combination of a file system and an I/O scheduler. We also propose a metric called Cemp to study the changes of energy consumption and mean performance for an Intel SSD(SSD-I) when it provides the largest AI, lowest power, and highest performance, respectively. Using Cemp, we attempt to find the combination of a file system and an I/O scheduler to make SSD-I deliver a smooth change in energy consumption. We employ Filebench as a workload generator to simulate a wide range of workloads(i.e., varmail, fileserver, and webserver), and explore optimal combinations of file systems and I/O schedulers(i.e., optimal values of AI) for tested SSDs under different workloads. Experimental results reveal that the proposed aggregative indicator is comprehensive for exploring the optimal combination of a file system and an I/O scheduler for SSDs, compared with an individual metric.
基金Project supported by the Second Brain Korea 21 Project and Samsung Electronics
文摘In this paper, we propose a fast and simple system emulator, called a system performance emulator(SPE), to evaluate long read operations.The SPE estimates how much system-wide performance is enhanced by using a faster solid state disk(SSD).By suspending a CPU for a certain time during direct memory access(DMA) transfer and subtracting this suspended time from the total DMA time, the SPE estimates the improvement in system performance expected from an enhanced SSD prior to its manufacture.We also examine the relation between storage performance and system performance using the SPE.
基金supported by ZTE Industry-Academia-Research Cooperation Funds
文摘Data layout in a file system is the organization of data stored in external storages. The data layout has a huge impact on performance of storage systems. We survey three main kinds of data layout in traditional file systems: in-place update file system, log-structured file system, and copy-on-write file sys- tem. Each file system has its own strengths and weaknesses under different circumstances. We also include a recent us- age of persistent layout in a file system that combines both flash memory and byte- addressable non- volatile memory. With this survey, we conclude that persistent data layout in file systems may evolve dramatically in the era of emerging non-volatile memory.