With the speed gap between storage system access and processor computing, end-to-end data processing has become a bottleneck to improve the total performance of computer systems over the Internet. Based on the analysi...With the speed gap between storage system access and processor computing, end-to-end data processing has become a bottleneck to improve the total performance of computer systems over the Internet. Based on the analysis of data processing behavior, an adaptive cache organization scheme is proposed with fast address calculation. This scheme can make full use of the characteristics of stack space data access, adopt fast address calculation strategy, and reduce the hit time of stack access. Adaptively, the stack cache can be turned off from beginning to end, when a stack overflow occurs to avoid the effect of stack switching on processor performance. Also, through the instruction cache and the failure behavior for the data cache, a prefetching policy is developed, which is combined with the data capture of the failover queue state. Finally, the proposed method can maintain the order of instruction and data access, which facilitates the extraction of prefetching in the end-to-end data processing.展开更多
Data pre-deployment in the HDFS (Hadoop distributed file systems) is more complicated than that in traditional file systems. There are many key issues need to be addressed, such as determining the target location of...Data pre-deployment in the HDFS (Hadoop distributed file systems) is more complicated than that in traditional file systems. There are many key issues need to be addressed, such as determining the target location of the data prefetching, the amount of data to be prefetched, the balance between data prefetching services and normal data accesses. Aiming to solve these problems, we employ the characteristics of digital ocean information service flows and propose a deployment scheme which combines input data prefetching with output data oriented storage strategies. The method achieves the parallelism of data preparation and data processing, thereby massively reducing I/O time cost of digital ocean cloud computing platforms when processing multi-source information synergistic tasks. The experimental results show that the scheme has a higher degree of parallelism than traditional Hadoop mechanisms, shortens the waiting time of a running service node, and significantly reduces data access conflicts.展开更多
Data prefetching is an effective data access latency hiding technique to mask the CPU stall caused by cache misses and to bridge the performance gap between processor and memory. With hardware and/or software support,...Data prefetching is an effective data access latency hiding technique to mask the CPU stall caused by cache misses and to bridge the performance gap between processor and memory. With hardware and/or software support, data prefetching brings data closer to a processor before it is actually needed. Many prefetching techniques have been developed for single-core processors. Recent developments in processor technology have brought multicore processors into mainstream. While some of the single-core prefetching techniques are directly applicable to multicore processors, numerous novel strategies have been proposed in the past few years to take advantage of multiple cores. This paper aims to provide a comprehensive review of the state-of-the-art prefetching techniques, and proposes a taxonomy that classifies various design concerns in developing a prefetching strategy, especially for multicore processors. We compare various existing methods through analysis as well.展开更多
Modern High-Performance Computing(HPC)systems are adding extra layers to the memory and storage hierarchy,named deep memory and storage hierarchy(DMSH),to increase I/O performance.New hardware technologies,such as NVM...Modern High-Performance Computing(HPC)systems are adding extra layers to the memory and storage hierarchy,named deep memory and storage hierarchy(DMSH),to increase I/O performance.New hardware technologies,such as NVMe and SSD,have been introduced in burst buffer installations to reduce the pressure for external storage and boost the burstiness of modern I/O systems.The DMSH has demonstrated its strength and potential in practice.However,each layer of DMSH is an independent heterogeneous system and data movement among more layers is significantly more complex even without considering heterogeneity.How to efficiently utilize the DMSH is a subject of research facing the HPC community.Further,accessing data with a high-throughput and low-latency is more imperative than ever.Data prefetching is a well-known technique for hiding read latency by requesting data before it is needed to move it from a high-latency medium(e.g.,disk)to a low-latency one(e.g.,main memory).However,existing solutions do not consider the new deep memory and storage hierarchy and also suffer from under-utilization of prefetching resources and unnecessary evictions.Additionally,existing approaches implement a client-pull model where understanding the application's I/O behavior drives prefetching decisions.Moving towards exascale,where machines run multiple applications concurrently by accessing files in a workflow,a more data-centric approach resolves challenges such as cache pollution and redundancy.In this paper,we present the design and implementation of Hermes:a new,heterogeneous-aware,multi-tiered,dynamic,and distributed I/O buffering system.Hermes enables,manages,supervises,and,in some sense,extends I/O buffering to fully integrate into the DMSH.We introduce three novel data placement policies to efficiently utilize all layers and we present three novel techniques to perform memory,metadata,and communication management in hierarchical buffering systems.Additionally,we demonstrate the benefits of a truly hierarchical data prefetcher that adopts a server-push approach to data prefetching.Our evaluation shows that,in addition to automatic data movement through the hierarchy,Hermes can significantly accelerate I/O and outperforms by more than 2x state-of-the-art buffering platforms.Lastly,results show 10%-35%performance gains over existing prefetchers and over 50%when compared to systems with no prefetching.展开更多
提出一种多目标的数据预取方法(multiple goals oriented data prefetching,简称MGODP)来满足不同用户的数据预取需求.MGODP不仅从用户偏好出发为其预取合适量的数据,而且从服务器角度出发,对于Client/Server模式下的数据访问提出全局...提出一种多目标的数据预取方法(multiple goals oriented data prefetching,简称MGODP)来满足不同用户的数据预取需求.MGODP不仅从用户偏好出发为其预取合适量的数据,而且从服务器角度出发,对于Client/Server模式下的数据访问提出全局合作的方法,以大幅度提高服务质量.另外,MGODP提供了移动客户端和服务器之间平衡工作负载的合作机制,合理分配系统资源,保障系统性能.通过一系列实验可以看出,MGODP方法能够很好地满足不同用户的需求,并通过全局合作和负载均衡机制在保证用户性能需求的前提下,尽可能地减小对电池电量和网络带宽的消耗.展开更多
文摘With the speed gap between storage system access and processor computing, end-to-end data processing has become a bottleneck to improve the total performance of computer systems over the Internet. Based on the analysis of data processing behavior, an adaptive cache organization scheme is proposed with fast address calculation. This scheme can make full use of the characteristics of stack space data access, adopt fast address calculation strategy, and reduce the hit time of stack access. Adaptively, the stack cache can be turned off from beginning to end, when a stack overflow occurs to avoid the effect of stack switching on processor performance. Also, through the instruction cache and the failure behavior for the data cache, a prefetching policy is developed, which is combined with the data capture of the failover queue state. Finally, the proposed method can maintain the order of instruction and data access, which facilitates the extraction of prefetching in the end-to-end data processing.
基金The Ocean Public Welfare Scientific Research Project of State Oceanic Administration of China under contract No.20110533
文摘Data pre-deployment in the HDFS (Hadoop distributed file systems) is more complicated than that in traditional file systems. There are many key issues need to be addressed, such as determining the target location of the data prefetching, the amount of data to be prefetched, the balance between data prefetching services and normal data accesses. Aiming to solve these problems, we employ the characteristics of digital ocean information service flows and propose a deployment scheme which combines input data prefetching with output data oriented storage strategies. The method achieves the parallelism of data preparation and data processing, thereby massively reducing I/O time cost of digital ocean cloud computing platforms when processing multi-source information synergistic tasks. The experimental results show that the scheme has a higher degree of parallelism than traditional Hadoop mechanisms, shortens the waiting time of a running service node, and significantly reduces data access conflicts.
基金supported in part by the National Science Foundation of USA under Grant Nos.EIA-0224377,CNS-0406328,CNS-0509118,and CCF-0621435.
文摘Data prefetching is an effective data access latency hiding technique to mask the CPU stall caused by cache misses and to bridge the performance gap between processor and memory. With hardware and/or software support, data prefetching brings data closer to a processor before it is actually needed. Many prefetching techniques have been developed for single-core processors. Recent developments in processor technology have brought multicore processors into mainstream. While some of the single-core prefetching techniques are directly applicable to multicore processors, numerous novel strategies have been proposed in the past few years to take advantage of multiple cores. This paper aims to provide a comprehensive review of the state-of-the-art prefetching techniques, and proposes a taxonomy that classifies various design concerns in developing a prefetching strategy, especially for multicore processors. We compare various existing methods through analysis as well.
基金This work is funded by the National Science Foundation of USA under Grants Nos.OCI-1835764 and CSR-1814872。
文摘Modern High-Performance Computing(HPC)systems are adding extra layers to the memory and storage hierarchy,named deep memory and storage hierarchy(DMSH),to increase I/O performance.New hardware technologies,such as NVMe and SSD,have been introduced in burst buffer installations to reduce the pressure for external storage and boost the burstiness of modern I/O systems.The DMSH has demonstrated its strength and potential in practice.However,each layer of DMSH is an independent heterogeneous system and data movement among more layers is significantly more complex even without considering heterogeneity.How to efficiently utilize the DMSH is a subject of research facing the HPC community.Further,accessing data with a high-throughput and low-latency is more imperative than ever.Data prefetching is a well-known technique for hiding read latency by requesting data before it is needed to move it from a high-latency medium(e.g.,disk)to a low-latency one(e.g.,main memory).However,existing solutions do not consider the new deep memory and storage hierarchy and also suffer from under-utilization of prefetching resources and unnecessary evictions.Additionally,existing approaches implement a client-pull model where understanding the application's I/O behavior drives prefetching decisions.Moving towards exascale,where machines run multiple applications concurrently by accessing files in a workflow,a more data-centric approach resolves challenges such as cache pollution and redundancy.In this paper,we present the design and implementation of Hermes:a new,heterogeneous-aware,multi-tiered,dynamic,and distributed I/O buffering system.Hermes enables,manages,supervises,and,in some sense,extends I/O buffering to fully integrate into the DMSH.We introduce three novel data placement policies to efficiently utilize all layers and we present three novel techniques to perform memory,metadata,and communication management in hierarchical buffering systems.Additionally,we demonstrate the benefits of a truly hierarchical data prefetcher that adopts a server-push approach to data prefetching.Our evaluation shows that,in addition to automatic data movement through the hierarchy,Hermes can significantly accelerate I/O and outperforms by more than 2x state-of-the-art buffering platforms.Lastly,results show 10%-35%performance gains over existing prefetchers and over 50%when compared to systems with no prefetching.
文摘提出一种多目标的数据预取方法(multiple goals oriented data prefetching,简称MGODP)来满足不同用户的数据预取需求.MGODP不仅从用户偏好出发为其预取合适量的数据,而且从服务器角度出发,对于Client/Server模式下的数据访问提出全局合作的方法,以大幅度提高服务质量.另外,MGODP提供了移动客户端和服务器之间平衡工作负载的合作机制,合理分配系统资源,保障系统性能.通过一系列实验可以看出,MGODP方法能够很好地满足不同用户的需求,并通过全局合作和负载均衡机制在保证用户性能需求的前提下,尽可能地减小对电池电量和网络带宽的消耗.