片上缓存资源是片上路由器的重要组成部分,其结构好坏直接关系到片上互联网络的实现复杂度、整体性能及功耗开销。鉴于异步电路的握手工作方式,异步路由器一般采用基于移位寄存器的异步FIFO(First In First Out)实现片上缓冲,这种结构...片上缓存资源是片上路由器的重要组成部分,其结构好坏直接关系到片上互联网络的实现复杂度、整体性能及功耗开销。鉴于异步电路的握手工作方式,异步路由器一般采用基于移位寄存器的异步FIFO(First In First Out)实现片上缓冲,这种结构导致了报文传输延迟及数据翻转次数增加。提出一种基于层次位线缓冲的异步FIFO结构,设计实现了一种新的异步路由器结构。相对于传统异步路由器,新的异步路由器能够有效降低路由器设计的硬件复杂度,减少数据的冗余翻转,降低功耗。实验结果表明在相同配置的情况下,新异步路由器面积降低了39.3%;当异步FIFO深度为8的时候,新异步路由器能够获得41.1%的功耗降低。展开更多
Modern High-Performance Computing(HPC)systems are adding extra layers to the memory and storage hierarchy,named deep memory and storage hierarchy(DMSH),to increase I/O performance.New hardware technologies,such as NVM...Modern High-Performance Computing(HPC)systems are adding extra layers to the memory and storage hierarchy,named deep memory and storage hierarchy(DMSH),to increase I/O performance.New hardware technologies,such as NVMe and SSD,have been introduced in burst buffer installations to reduce the pressure for external storage and boost the burstiness of modern I/O systems.The DMSH has demonstrated its strength and potential in practice.However,each layer of DMSH is an independent heterogeneous system and data movement among more layers is significantly more complex even without considering heterogeneity.How to efficiently utilize the DMSH is a subject of research facing the HPC community.Further,accessing data with a high-throughput and low-latency is more imperative than ever.Data prefetching is a well-known technique for hiding read latency by requesting data before it is needed to move it from a high-latency medium(e.g.,disk)to a low-latency one(e.g.,main memory).However,existing solutions do not consider the new deep memory and storage hierarchy and also suffer from under-utilization of prefetching resources and unnecessary evictions.Additionally,existing approaches implement a client-pull model where understanding the application's I/O behavior drives prefetching decisions.Moving towards exascale,where machines run multiple applications concurrently by accessing files in a workflow,a more data-centric approach resolves challenges such as cache pollution and redundancy.In this paper,we present the design and implementation of Hermes:a new,heterogeneous-aware,multi-tiered,dynamic,and distributed I/O buffering system.Hermes enables,manages,supervises,and,in some sense,extends I/O buffering to fully integrate into the DMSH.We introduce three novel data placement policies to efficiently utilize all layers and we present three novel techniques to perform memory,metadata,and communication management in hierarchical buffering systems.Additionally,we demonstrate the benefits of a truly hierarchical data prefetcher that adopts a server-push approach to data prefetching.Our evaluation shows that,in addition to automatic data movement through the hierarchy,Hermes can significantly accelerate I/O and outperforms by more than 2x state-of-the-art buffering platforms.Lastly,results show 10%-35%performance gains over existing prefetchers and over 50%when compared to systems with no prefetching.展开更多
文摘片上缓存资源是片上路由器的重要组成部分,其结构好坏直接关系到片上互联网络的实现复杂度、整体性能及功耗开销。鉴于异步电路的握手工作方式,异步路由器一般采用基于移位寄存器的异步FIFO(First In First Out)实现片上缓冲,这种结构导致了报文传输延迟及数据翻转次数增加。提出一种基于层次位线缓冲的异步FIFO结构,设计实现了一种新的异步路由器结构。相对于传统异步路由器,新的异步路由器能够有效降低路由器设计的硬件复杂度,减少数据的冗余翻转,降低功耗。实验结果表明在相同配置的情况下,新异步路由器面积降低了39.3%;当异步FIFO深度为8的时候,新异步路由器能够获得41.1%的功耗降低。
基金This work is funded by the National Science Foundation of USA under Grants Nos.OCI-1835764 and CSR-1814872。
文摘Modern High-Performance Computing(HPC)systems are adding extra layers to the memory and storage hierarchy,named deep memory and storage hierarchy(DMSH),to increase I/O performance.New hardware technologies,such as NVMe and SSD,have been introduced in burst buffer installations to reduce the pressure for external storage and boost the burstiness of modern I/O systems.The DMSH has demonstrated its strength and potential in practice.However,each layer of DMSH is an independent heterogeneous system and data movement among more layers is significantly more complex even without considering heterogeneity.How to efficiently utilize the DMSH is a subject of research facing the HPC community.Further,accessing data with a high-throughput and low-latency is more imperative than ever.Data prefetching is a well-known technique for hiding read latency by requesting data before it is needed to move it from a high-latency medium(e.g.,disk)to a low-latency one(e.g.,main memory).However,existing solutions do not consider the new deep memory and storage hierarchy and also suffer from under-utilization of prefetching resources and unnecessary evictions.Additionally,existing approaches implement a client-pull model where understanding the application's I/O behavior drives prefetching decisions.Moving towards exascale,where machines run multiple applications concurrently by accessing files in a workflow,a more data-centric approach resolves challenges such as cache pollution and redundancy.In this paper,we present the design and implementation of Hermes:a new,heterogeneous-aware,multi-tiered,dynamic,and distributed I/O buffering system.Hermes enables,manages,supervises,and,in some sense,extends I/O buffering to fully integrate into the DMSH.We introduce three novel data placement policies to efficiently utilize all layers and we present three novel techniques to perform memory,metadata,and communication management in hierarchical buffering systems.Additionally,we demonstrate the benefits of a truly hierarchical data prefetcher that adopts a server-push approach to data prefetching.Our evaluation shows that,in addition to automatic data movement through the hierarchy,Hermes can significantly accelerate I/O and outperforms by more than 2x state-of-the-art buffering platforms.Lastly,results show 10%-35%performance gains over existing prefetchers and over 50%when compared to systems with no prefetching.