Application launch performance is of great importance to system platform developers and vendors as it greatly affects the degree of users' satisfaction. The single most effective way to improve application launch per...Application launch performance is of great importance to system platform developers and vendors as it greatly affects the degree of users' satisfaction. The single most effective way to improve application launch performance is to replace a hard disk drive (HDD) with a solid state drive (SSD), which has recently become affordable and popular. A natural question is then whether or not to replace the traditional HDD-aware application launchers with a new SSD-aware optimizer. We address this question by analyzing the inefficiency of the HDD-aware application launchers on SSDs and then proposing a new SSD-aware application prefetching scheme, called the Fast Application STarter (FAST). The key idea of FAST is to overlap the computation (CPU) time with the SSD access (I/O) time during an application launch. FAST is composed of a set of user-level components and system debugging tools provided by Linux OS (operating system). Hence, FAST can be easily deployed in any recent Linux versions without kernel recompilation. We implement FAST on a desktop PC with an SSD running Linux 2.6.32 OS and evaluate it by launching a set of widely-used applications, demonstrating an average of 28% reduction of application launch time as compared to PC without a prefetcher.展开更多
Emerging non-volatile memory technologies,especially flash-based solid state drives(SSDs),have increasingly been adopted in the storage stack.They provide numerous advantages over traditional mechanically rotating har...Emerging non-volatile memory technologies,especially flash-based solid state drives(SSDs),have increasingly been adopted in the storage stack.They provide numerous advantages over traditional mechanically rotating hard disk drives(HDDs)and have a tendency to replace HDDs.Due to the long existence of HDDs as primary building blocks for storage systems,however,much of the system software has been specially designed for HDD and may not be optimal for non-volatile memory media.Therefore,in order to realistically leverage its superior raw performance to the maximum,the existing upper layer software has to be re-evaluated or re-designed.To this end,in this paper,we propose PASS,an optimized I/O scheduler at the Linux block layer to accommodate the changing trend of underlying storage devices toward flash-based SSDs.PASS takes the rich internal parallelism in SSDs into account when dispatching requests to the device driver in order to achieve high performance.Specifically,it partitions the logical storage space into fixed-size regions(preferably the component package sizes)as scheduling units.These scheduling units are serviced in a round-robin manner and for every chance that the chosen dispatching unit issues only a batch of either read or write requests to suppress the excessive mutual interference.Additionally,the requests are sorted according to their visiting addresses while waiting in the dispatching queues to exploit high sequential performance of SSD.The experimental results with a variety of workloads have shown that PASS outperforms the four Linux off-the-shelf I/O schedulers by a degree of 3%up to41%,while at the same time it improves the lifetime significantly,due to reducing the internal write amplification.展开更多
分布式文件系统的元数据性能是制约系统整体性能的关键瓶颈.尽管固态盘(Solid State Drive,SSD)提供高速的数据访问,但是由于元数据呈现粒度小、更新频繁的特征,SSD的性能表现仍然较差,同时导致寿命损耗加速.基于SSD存储介质的写入特性...分布式文件系统的元数据性能是制约系统整体性能的关键瓶颈.尽管固态盘(Solid State Drive,SSD)提供高速的数据访问,但是由于元数据呈现粒度小、更新频繁的特征,SSD的性能表现仍然较差,同时导致寿命损耗加速.基于SSD存储介质的写入特性,提出了面向分布式文件系统元数据的数据管理机制和更新方法,包括元数据内存页面的重新组织和管理、多次变化数据的迭代更新、元数据写入方式的进一步优化等.所提方法减少了元数据更新的写入频次和实际写入量,减少了随机写操作,提高了元数据写入性能.展开更多
固态盘(solid state drive,SSD)因为其优越的性能已被大量部署于当前的存储系统中.但是,由于寿命有限,SSD的可靠性受到广泛的质疑.磁盘阵列(redundant arrays of inexpensive disk,RAID)是一种传统的用来提高可靠性的手段,但并不适用于S...固态盘(solid state drive,SSD)因为其优越的性能已被大量部署于当前的存储系统中.但是,由于寿命有限,SSD的可靠性受到广泛的质疑.磁盘阵列(redundant arrays of inexpensive disk,RAID)是一种传统的用来提高可靠性的手段,但并不适用于SSD.这项工作提出一种基于SSD和磁盘的混合存储系统,构建该系统的主要思想是SSD响应所有I/O请求,从而获得较高的性能;磁盘备份所有数据,从而保证系统的可靠性.但是,磁盘的I/O性能显著低于SSD,构建该系统的问题在于磁盘能否及时地备份SSD上的数据.为了解决这一问题,从两方面提出优化:在延迟方面,采用非易失主存弥补磁盘与SSD的延迟差距;在带宽方面,采用两种措施:1)在单块磁盘内部重组I/O请求,使磁盘尽可能的顺序读写;2)采用多块磁盘备份多块SSD,通过将一块SSD上的写请求分散到多块磁盘上,有效应对单块SSD上出现的突发写请求.通过原型系统实现表明,该混合系统是可行的:磁盘能够为SSD提供实时的数据备份;与其他系统相比,该混合系统取得较高的性价比.展开更多
以SSD(solid state drive)为代表的新型存储介质在虚拟化环境下得到了广泛的应用,通常作为虚拟机读写缓存,起到优化磁盘I/O性能的作用.已有研究往往关注SSD缓存的容量规划,依据缓存读写命中率评价SSD缓存分配效果,未能充分考虑SSD的服...以SSD(solid state drive)为代表的新型存储介质在虚拟化环境下得到了广泛的应用,通常作为虚拟机读写缓存,起到优化磁盘I/O性能的作用.已有研究往往关注SSD缓存的容量规划,依据缓存读写命中率评价SSD缓存分配效果,未能充分考虑SSD的服务能力上限,难以适用于典型的分布式应用场景,存在虚拟机抢占SSD缓存资源,导致虚拟机中应用性能违约的可能.实现了虚拟化环境下面向多目标优化的自适应SSD缓存系统,考虑了SSD的服务能力上限.基于自适应闭环实现对虚拟机和应用状态的动态感知.动态检测局部SSD缓存抢占状态,基于聚类方法生成虚拟机的优化放置方案,依据全局SSD缓存供给能力确定虚拟机迁移顺序和时机.实验结果表明,该方法在应对典型分布式应用场景时可以有效缓解SSD缓存资源的争用,同时满足应用对虚拟机放置的需求,提升应用的性能并兼顾应用的可靠性.在Hadoop应用场景下,平均降低了25%的任务执行时间,对I/O密集型应用平均提升39%的吞吐率.在Zoo Keeper应用场景下,以不到5%的性能损失为代价,应对了虚拟化主机的单点失效带来的虚拟机宕机问题.展开更多
基于固态硬盘(solid-state drive,SSD)和硬盘(hard disk drive,HDD)混合存储的数据中心已经成为大数据计算领域的高性能载体,数据中心负载应该可将不同特性的数据按需持久化到SSD或HDD,以提升系统整体性能.Spark是目前产业界广泛使用的...基于固态硬盘(solid-state drive,SSD)和硬盘(hard disk drive,HDD)混合存储的数据中心已经成为大数据计算领域的高性能载体,数据中心负载应该可将不同特性的数据按需持久化到SSD或HDD,以提升系统整体性能.Spark是目前产业界广泛使用的高效大数据计算框架,尤其适用于多次迭代计算的应用领域,其原因在于Spark可以将中间数据持久化在内存或硬盘中,且持久化数据到硬盘打破了内存容量不足对数据集规模的限制.然而,当前的Spark实现并未专门提供显式的面向SSD的持久化接口,尽管可根据配置信息将数据按比例分布到不同的存储介质中,但是用户无法根据数据特征按需指定RDD的持久化存储介质,针对性和灵活性不足.这不仅成为进一步提升Spark性能的瓶颈,而且严重影响了混合存储系统性能的发挥.有鉴于此,首次提出面向SSD的数据持久化策略.探索了Spark数据持久化原理,基于混合存储系统优化了Spark的持久化架构,最终通过提供特定的持久化API实现用户可显式、灵活指定RDD的持久化介质.基于SparkBench的实验结果表明,经本方案优化后的Spark与原生版本相比,其性能平均提升14.02%.展开更多
基金supported by RP-Grant 2010 of Ewha Womans University
文摘Application launch performance is of great importance to system platform developers and vendors as it greatly affects the degree of users' satisfaction. The single most effective way to improve application launch performance is to replace a hard disk drive (HDD) with a solid state drive (SSD), which has recently become affordable and popular. A natural question is then whether or not to replace the traditional HDD-aware application launchers with a new SSD-aware optimizer. We address this question by analyzing the inefficiency of the HDD-aware application launchers on SSDs and then proposing a new SSD-aware application prefetching scheme, called the Fast Application STarter (FAST). The key idea of FAST is to overlap the computation (CPU) time with the SSD access (I/O) time during an application launch. FAST is composed of a set of user-level components and system debugging tools provided by Linux OS (operating system). Hence, FAST can be easily deployed in any recent Linux versions without kernel recompilation. We implement FAST on a desktop PC with an SSD running Linux 2.6.32 OS and evaluate it by launching a set of widely-used applications, demonstrating an average of 28% reduction of application launch time as compared to PC without a prefetcher.
基金supported by the National Basic Research Program(973)of China(No.2004CB318203) the National High-Tech R&D Program(863)of China(No.2009AA01A402)+1 种基金the Natural Science Foundation of Hubei Province,China(No.2013CFB035)the Key Science Research Project of Hubei Education Office in China(No.D20141301)
文摘Emerging non-volatile memory technologies,especially flash-based solid state drives(SSDs),have increasingly been adopted in the storage stack.They provide numerous advantages over traditional mechanically rotating hard disk drives(HDDs)and have a tendency to replace HDDs.Due to the long existence of HDDs as primary building blocks for storage systems,however,much of the system software has been specially designed for HDD and may not be optimal for non-volatile memory media.Therefore,in order to realistically leverage its superior raw performance to the maximum,the existing upper layer software has to be re-evaluated or re-designed.To this end,in this paper,we propose PASS,an optimized I/O scheduler at the Linux block layer to accommodate the changing trend of underlying storage devices toward flash-based SSDs.PASS takes the rich internal parallelism in SSDs into account when dispatching requests to the device driver in order to achieve high performance.Specifically,it partitions the logical storage space into fixed-size regions(preferably the component package sizes)as scheduling units.These scheduling units are serviced in a round-robin manner and for every chance that the chosen dispatching unit issues only a batch of either read or write requests to suppress the excessive mutual interference.Additionally,the requests are sorted according to their visiting addresses while waiting in the dispatching queues to exploit high sequential performance of SSD.The experimental results with a variety of workloads have shown that PASS outperforms the four Linux off-the-shelf I/O schedulers by a degree of 3%up to41%,while at the same time it improves the lifetime significantly,due to reducing the internal write amplification.
文摘分布式文件系统的元数据性能是制约系统整体性能的关键瓶颈.尽管固态盘(Solid State Drive,SSD)提供高速的数据访问,但是由于元数据呈现粒度小、更新频繁的特征,SSD的性能表现仍然较差,同时导致寿命损耗加速.基于SSD存储介质的写入特性,提出了面向分布式文件系统元数据的数据管理机制和更新方法,包括元数据内存页面的重新组织和管理、多次变化数据的迭代更新、元数据写入方式的进一步优化等.所提方法减少了元数据更新的写入频次和实际写入量,减少了随机写操作,提高了元数据写入性能.
文摘固态盘(solid state drive,SSD)因为其优越的性能已被大量部署于当前的存储系统中.但是,由于寿命有限,SSD的可靠性受到广泛的质疑.磁盘阵列(redundant arrays of inexpensive disk,RAID)是一种传统的用来提高可靠性的手段,但并不适用于SSD.这项工作提出一种基于SSD和磁盘的混合存储系统,构建该系统的主要思想是SSD响应所有I/O请求,从而获得较高的性能;磁盘备份所有数据,从而保证系统的可靠性.但是,磁盘的I/O性能显著低于SSD,构建该系统的问题在于磁盘能否及时地备份SSD上的数据.为了解决这一问题,从两方面提出优化:在延迟方面,采用非易失主存弥补磁盘与SSD的延迟差距;在带宽方面,采用两种措施:1)在单块磁盘内部重组I/O请求,使磁盘尽可能的顺序读写;2)采用多块磁盘备份多块SSD,通过将一块SSD上的写请求分散到多块磁盘上,有效应对单块SSD上出现的突发写请求.通过原型系统实现表明,该混合系统是可行的:磁盘能够为SSD提供实时的数据备份;与其他系统相比,该混合系统取得较高的性价比.
文摘以SSD(solid state drive)为代表的新型存储介质在虚拟化环境下得到了广泛的应用,通常作为虚拟机读写缓存,起到优化磁盘I/O性能的作用.已有研究往往关注SSD缓存的容量规划,依据缓存读写命中率评价SSD缓存分配效果,未能充分考虑SSD的服务能力上限,难以适用于典型的分布式应用场景,存在虚拟机抢占SSD缓存资源,导致虚拟机中应用性能违约的可能.实现了虚拟化环境下面向多目标优化的自适应SSD缓存系统,考虑了SSD的服务能力上限.基于自适应闭环实现对虚拟机和应用状态的动态感知.动态检测局部SSD缓存抢占状态,基于聚类方法生成虚拟机的优化放置方案,依据全局SSD缓存供给能力确定虚拟机迁移顺序和时机.实验结果表明,该方法在应对典型分布式应用场景时可以有效缓解SSD缓存资源的争用,同时满足应用对虚拟机放置的需求,提升应用的性能并兼顾应用的可靠性.在Hadoop应用场景下,平均降低了25%的任务执行时间,对I/O密集型应用平均提升39%的吞吐率.在Zoo Keeper应用场景下,以不到5%的性能损失为代价,应对了虚拟化主机的单点失效带来的虚拟机宕机问题.
文摘基于固态硬盘(solid-state drive,SSD)和硬盘(hard disk drive,HDD)混合存储的数据中心已经成为大数据计算领域的高性能载体,数据中心负载应该可将不同特性的数据按需持久化到SSD或HDD,以提升系统整体性能.Spark是目前产业界广泛使用的高效大数据计算框架,尤其适用于多次迭代计算的应用领域,其原因在于Spark可以将中间数据持久化在内存或硬盘中,且持久化数据到硬盘打破了内存容量不足对数据集规模的限制.然而,当前的Spark实现并未专门提供显式的面向SSD的持久化接口,尽管可根据配置信息将数据按比例分布到不同的存储介质中,但是用户无法根据数据特征按需指定RDD的持久化存储介质,针对性和灵活性不足.这不仅成为进一步提升Spark性能的瓶颈,而且严重影响了混合存储系统性能的发挥.有鉴于此,首次提出面向SSD的数据持久化策略.探索了Spark数据持久化原理,基于混合存储系统优化了Spark的持久化架构,最终通过提供特定的持久化API实现用户可显式、灵活指定RDD的持久化介质.基于SparkBench的实验结果表明,经本方案优化后的Spark与原生版本相比,其性能平均提升14.02%.