In distributed storage systems,file access efficiency has an important impact on the real-time nature of information forensics.As a popular approach to improve file accessing efficiency,prefetching model can fetches d...In distributed storage systems,file access efficiency has an important impact on the real-time nature of information forensics.As a popular approach to improve file accessing efficiency,prefetching model can fetches data before it is needed according to the file access pattern,which can reduce the I/O waiting time and increase the system concurrency.However,prefetching model needs to mine the degree of association between files to ensure the accuracy of prefetching.In the massive small file situation,the sheer volume of files poses a challenge to the efficiency and accuracy of relevance mining.In this paper,we propose a massive files prefetching model based on LSTM neural network with cache transaction strategy to improve file access efficiency.Firstly,we propose a file clustering algorithm based on temporal locality and spatial locality to reduce the computational complexity.Secondly,we propose a definition of cache transaction according to files occurrence in cache instead of time-offset distance based methods to extract file block feature accurately.Lastly,we innovatively propose a file access prediction algorithm based on LSTM neural network which predict the file that have high possibility to be accessed.Experiments show that compared with the traditional LRU and the plain grouping methods,the proposed model notably increase the cache hit rate and effectively reduces the I/O wait time.展开更多
With the number of connected devices increasing rapidly,the access latency issue increases drastically in the edge cloud environment.Massive low time-constrained and data-intensive mobile applications require efficien...With the number of connected devices increasing rapidly,the access latency issue increases drastically in the edge cloud environment.Massive low time-constrained and data-intensive mobile applications require efficient replication strategies to decrease retrieval time.However,the determination of replicas is not reasonable in many previous works,which incurs high response delay.To this end,a correlation-aware replica prefetching(CRP)strategy based on the file correlation principle is proposed,which can prefetch the files with high access probability.The key is to determine and obtain the implicit high-value files effectively,which has a significant impact on the performance of CRP.To achieve the goal of accelerating the acquisition of implicit highvalue files,an access rule management method based on consistent hashing is proposed,and then the storage and query mechanisms for access rules based on adjacency list storage structure are further presented.The theoretical analysis and simulation results corroborate that CRP shortens average response time over 4.8%,improves average hit ratio over 4.2%,reduces transmitting data amount over 8.3%,and maintains replication frequency at a reasonable level when compared to other schemes.展开更多
A novel approach that integrates occlusion culling within the view-dependent rendering framework is proposed. The algorithm uses the prioritized-layered projection(PLP) algorithm to occlude those obscured objects, a...A novel approach that integrates occlusion culling within the view-dependent rendering framework is proposed. The algorithm uses the prioritized-layered projection(PLP) algorithm to occlude those obscured objects, and uses an approximate visibility technique to accurately and efficiently determine which objects will be visible in the coming future and prefetch those objects from disk before they are rendered, view-dependent rendering technique provides the ability to change level of detail over the surface seamlessly and smoothly in real-time according to cell solidity value.展开更多
In this paper, a prefetching technique is proposed to solve the performance problem caused by remote data access delay. In the technique, the map tasks which will cause the delay are predicted first and then the input...In this paper, a prefetching technique is proposed to solve the performance problem caused by remote data access delay. In the technique, the map tasks which will cause the delay are predicted first and then the input data of these tasks will be preloaded before the tasks are scheduled. During the execution, the input data can be read from local nodes. Therefore, the delay can be hidden. The technique has been implemented in Hadoop-0. 20.1. The experiment results have shown that the technique reduces map tasks causing delay, and improves the performance of Hadoop MapRe- duce by 20%.展开更多
In this paper, we present a comparative study between informed and predictive prefetching mechanisms that were presented to leverage the performance gap between I/O storage systems and CPU. In particular, we will focu...In this paper, we present a comparative study between informed and predictive prefetching mechanisms that were presented to leverage the performance gap between I/O storage systems and CPU. In particular, we will focus on transparent informed prefetching (TIP) and predictive prefetching using probability graph approach (PG). Our main objective is to show the main features, motivations, and implementation overview of each mechanism. We also conducted a performance evaluation discussion that shows a comparison between both mechanisms performance when using different cache size values.展开更多
In this paper, we present a predictive prefetching mechanism that is based on probability graph approach to perform prefetching between different levels in a parallel hybrid storage system. The fundamental concept of ...In this paper, we present a predictive prefetching mechanism that is based on probability graph approach to perform prefetching between different levels in a parallel hybrid storage system. The fundamental concept of our approach is to invoke parallel hybrid storage system’s parallelism and prefetch data among multiple storage levels (e.g. solid state disks, and hard disk drives) in parallel with the application’s on-demand I/O reading requests. In this study, we show that a predictive prefetching across multiple storage levels is an efficient technique for placing near future needed data blocks in the uppermost levels near the application. Our PPHSS approach extends previous ideas of predictive prefetching in two ways: (1) our approach reduces applications’ execution elapsed time by keeping data blocks that are predicted to be accessed in the near future cached in the uppermost level;(2) we propose a parallel data fetching scheme in which multiple fetching mechanisms (i.e. predictive prefetching and application’s on-demand data requests) can work in parallel;where the first one fetches data blocks among the different levels of the hybrid storage systems (i.e. low-level (slow) to high-level (fast) storage devices) and the other one fetches the data from the storage system to the application. Our PPHSS strategy integrated with the predictive prefetching mechanism significantly reduces overall I/O access time in a hybrid storage system. Finally, we developed a simulator to evaluate the performance of the proposed predictive prefetching scheme in the context of hybrid storage systems. Our results show that our PPHSS can improve system performance by 4% across real-world I/O traces without the need of using large size caches.展开更多
A scheme for high-speed data transfer via the Internet for Web service in an extremely large delay environment is proposed. With the wide-spread use of Internet services in recent years, WLAN Internet service in high-...A scheme for high-speed data transfer via the Internet for Web service in an extremely large delay environment is proposed. With the wide-spread use of Internet services in recent years, WLAN Internet service in high-speed trains has commenced. The system for this is composed of a satellite communication system between the train and the ground station, which is characterized by extremely large latency of several hundred milliseconds due to long propagation latency. High-speed web access is not available to users in a train in such an extremely large latency network system. Thus, a prefetch scheme for performance acceleration of Web services in this environment is proposed. A test-bed system that implements the proposed scheme is implemented and is its performance in this test-bed is evaluated. The proposed scheme is verified to enable high-speed Web access in the extremely large delay environment compared to conventional schemes.展开更多
With the speed gap between storage system access and processor computing, end-to-end data processing has become a bottleneck to improve the total performance of computer systems over the Internet. Based on the analysi...With the speed gap between storage system access and processor computing, end-to-end data processing has become a bottleneck to improve the total performance of computer systems over the Internet. Based on the analysis of data processing behavior, an adaptive cache organization scheme is proposed with fast address calculation. This scheme can make full use of the characteristics of stack space data access, adopt fast address calculation strategy, and reduce the hit time of stack access. Adaptively, the stack cache can be turned off from beginning to end, when a stack overflow occurs to avoid the effect of stack switching on processor performance. Also, through the instruction cache and the failure behavior for the data cache, a prefetching policy is developed, which is combined with the data capture of the failover queue state. Finally, the proposed method can maintain the order of instruction and data access, which facilitates the extraction of prefetching in the end-to-end data processing.展开更多
Data prefetching is an effective data access latency hiding technique to mask the CPU stall caused by cache misses and to bridge the performance gap between processor and memory. With hardware and/or software support,...Data prefetching is an effective data access latency hiding technique to mask the CPU stall caused by cache misses and to bridge the performance gap between processor and memory. With hardware and/or software support, data prefetching brings data closer to a processor before it is actually needed. Many prefetching techniques have been developed for single-core processors. Recent developments in processor technology have brought multicore processors into mainstream. While some of the single-core prefetching techniques are directly applicable to multicore processors, numerous novel strategies have been proposed in the past few years to take advantage of multiple cores. This paper aims to provide a comprehensive review of the state-of-the-art prefetching techniques, and proposes a taxonomy that classifies various design concerns in developing a prefetching strategy, especially for multicore processors. We compare various existing methods through analysis as well.展开更多
The World Wide Web has become the primary means for information dissemination. Due to the limited resources of the network bandwidth, users always suffer from long time waiting. Web prefetching and web caching are the...The World Wide Web has become the primary means for information dissemination. Due to the limited resources of the network bandwidth, users always suffer from long time waiting. Web prefetching and web caching are the primary approaches to reducing the user perceived access latency and improving the quality of services. In this paper, a Stochastic Petri Nets (SPN) based integrated web prefetching and caching model (IWPCM) is presented and the performance evaluation of IWPCM is made. The performance metrics, access latency, throughput, HR (hit ratio) and BHR (byte hit ratio) are analyzed and discussed. Simulations show that compared with caching only model (CM), IWPCM can further improve the throughput, HR and BHR efficiently and reduce the access latency. The performance evaluation based on the SPN model can provide a basis for implementation of web prefetching and caching and the combination of web prefetching and caching holds the promise of improving the QoS of web systems.展开更多
As the speed gap between main memory and modern processors continues to widen, the cache behavior becomes more important for main memory database systems (MMDBs). Indexing technique is a key component of MMDBs. Unfo...As the speed gap between main memory and modern processors continues to widen, the cache behavior becomes more important for main memory database systems (MMDBs). Indexing technique is a key component of MMDBs. Unfortunately, the predominant indexes -B^+-trees and T-trees -- have been shown to utilize cache poorly, which triggers the development of many cache-conscious indexes, such as CSB^+-trees and pB^+-trees. Most of these cache-conscious indexes are variants of conventional B^+-trees, and have better cache performance than B^+-trees. In this paper, we develop a novel J^+-tree index, inspired by the Judy structure which is an associative array data structure, and propose a more cacheoptimized index -- Prefetching J^+-tree (pJ^+-tree), which applies prefetching to J^+-tree to accelerate range scan operations. The J^+-tree stores all the keys in its leaf nodes and keeps the reference values of leaf nodes in a Judy structure, which makes J^+-tree not only hold the advantages of Judy (such as fast single value search) but also outperform it in other aspects. For example, J^+-trees can achieve better performance on range queries than Judy. The pJ^+-tree index exploits prefetching techniques to further improve the cache behavior of J^+-trees and yields a speedup of 2.0 on range scans. Compared with B^+-trees, CSB^+-trees, pB^+-trees and T-trees, our extensive experimental Study shows that pJ^+-trees can provide better performance on both time (search, scan, update) and space aspects.展开更多
Modern High-Performance Computing(HPC)systems are adding extra layers to the memory and storage hierarchy,named deep memory and storage hierarchy(DMSH),to increase I/O performance.New hardware technologies,such as NVM...Modern High-Performance Computing(HPC)systems are adding extra layers to the memory and storage hierarchy,named deep memory and storage hierarchy(DMSH),to increase I/O performance.New hardware technologies,such as NVMe and SSD,have been introduced in burst buffer installations to reduce the pressure for external storage and boost the burstiness of modern I/O systems.The DMSH has demonstrated its strength and potential in practice.However,each layer of DMSH is an independent heterogeneous system and data movement among more layers is significantly more complex even without considering heterogeneity.How to efficiently utilize the DMSH is a subject of research facing the HPC community.Further,accessing data with a high-throughput and low-latency is more imperative than ever.Data prefetching is a well-known technique for hiding read latency by requesting data before it is needed to move it from a high-latency medium(e.g.,disk)to a low-latency one(e.g.,main memory).However,existing solutions do not consider the new deep memory and storage hierarchy and also suffer from under-utilization of prefetching resources and unnecessary evictions.Additionally,existing approaches implement a client-pull model where understanding the application's I/O behavior drives prefetching decisions.Moving towards exascale,where machines run multiple applications concurrently by accessing files in a workflow,a more data-centric approach resolves challenges such as cache pollution and redundancy.In this paper,we present the design and implementation of Hermes:a new,heterogeneous-aware,multi-tiered,dynamic,and distributed I/O buffering system.Hermes enables,manages,supervises,and,in some sense,extends I/O buffering to fully integrate into the DMSH.We introduce three novel data placement policies to efficiently utilize all layers and we present three novel techniques to perform memory,metadata,and communication management in hierarchical buffering systems.Additionally,we demonstrate the benefits of a truly hierarchical data prefetcher that adopts a server-push approach to data prefetching.Our evaluation shows that,in addition to automatic data movement through the hierarchy,Hermes can significantly accelerate I/O and outperforms by more than 2x state-of-the-art buffering platforms.Lastly,results show 10%-35%performance gains over existing prefetchers and over 50%when compared to systems with no prefetching.展开更多
A major overhead in software DSM (Distributed Shared Memory) is the cost of remote memory accesses necessitated by the protocol as well as induced by false sharing. This paper introduces a dynamic prefetching method i...A major overhead in software DSM (Distributed Shared Memory) is the cost of remote memory accesses necessitated by the protocol as well as induced by false sharing. This paper introduces a dynamic prefetching method implemented in the JIAJIA software DSM to reduce system overhead caused by remote accesses. The prefetching method records the interleaving string of INV (invalidation) and GETP (getting a remote page) operations for each cached page and analyzes the periodicity of the string when a page is invalidated on a lock or barrier. A prefetching request is issued after the lock or barrier if the periodicity analysis indicates that GETP will be the next operation in the string. Multiple prefetching requests are merged into the same message if they are to the same host. Performance evaluation with eight well-accepted benchmarks in a cluster of sixteen PowerPC workstations shows that the prefetching scheme can significantly reduce the page fault overhead and as a result achieves a performance increase of 15%-20% in three benchmarks and around 8%-10% in another three. The average extra traffic caused by useless prefetches is only 7%-13% in the evaluation.展开更多
As the scale of supercomputers rapidly grows, the reliability problem dominates the system availability. Existing fault tolerance mechanisms, such as periodic checkpointing and process redundancy, cannot effectively f...As the scale of supercomputers rapidly grows, the reliability problem dominates the system availability. Existing fault tolerance mechanisms, such as periodic checkpointing and process redundancy, cannot effectively fix this problem. To address this issue, we present a new fault tolerance framework using process replication and prefetching (FTRP), combining the benefits of proactive and reactive mechanisms. FTRP incorporates a novel cost model and a new proactive fault tolerance mechanism to improve the application execution efficiency. The novel cost model, called the 'work-most' (WM) model, makes runtime decisions to adaptively choose an action from a set of fault tolerance mechanisms based on failure prediction results and application status. Similar to program locality, we observe the failure locality phenomenon in supercomputers for the first time. In the new proactive fault tolerance mechanism, process replication with process prefetching is proposed based on the failure locality, significantly avoiding losses caused by the failures regardless of whether they have been predicted. Simulations with real failure traces demonstrate that the FTRP framework outperforms existing fault tolerance mechanisms with up to 10% improvement in application efficiency for common failure prediction accuracy, and is effective for petascale systems and beyond.展开更多
Stride prefetching is recognized as an important technique to improve memory access performance. The prior work usually profiles and/or analyzes the program behavior offline, and uses the identified stride patterns to...Stride prefetching is recognized as an important technique to improve memory access performance. The prior work usually profiles and/or analyzes the program behavior offline, and uses the identified stride patterns to guide the compilation process by injecting the prefetch instructions at appropriate places. There are some researches trying to enable stride prefetching in runtime systems with online profiling, but they either cannot discover cross-procedural prefetch opportunity, or require special supports in hardware or garbage collection. In this paper, we present a prefetch engine for JVM (Java Virtual Machine). It firstly identifies the candidate load operations during just-in-time (JIT) compilation, and then instruments the compiled code to profile the addresses of those loads. The runtime profile is collected in a trace buffer, which triggers a prefetch controller upon a protection fault. The prefetch controller analyzes the trace to discover any stride patterns, then modifies the compiled code to inject the prefetch instructions in place of the instrumentations. One of the major advantages of this engine is that, it can detect striding loads in any virtual code places for both regular and irregular code, not being limited with plain loop or procedure scopes. Actually we found the cross-procedural patterns take about 30% of all the prefetchings in the representative Java benchmarks. Another major advantage of the engine is that it has runtime overhead much smaller (the maximal is less than 4.0%) than the benefits it brings. Our evaluation with Apache Harmony JVM shows that the engine can achieve an average 6.2% speed-up with SPECJVM98 and DaCapo on Intel Pentium 4 platform, in spite of the runtime overhead.展开更多
Sequential prefetching schemes are widely employed in storage servers to mask disk latency and improve system throughput. However, existing schemes cannot benefit parallel disk systems as expected due to the fact that...Sequential prefetching schemes are widely employed in storage servers to mask disk latency and improve system throughput. However, existing schemes cannot benefit parallel disk systems as expected due to the fact that they ignore the distinct internal characteristics of the parallel disk system, in particular, data striping. Moreover, their aggressive prefetching pattern suffers from premature evictions and prolonged request latencies. In this paper, we propose a strip-oriented asynchronous prefetching (SoAP) technique, which is dedicated to the parallel disk system. It settles the above-mentioned problems by providing multiple novel features, e.g., enhanced prediction accuracy, adaptive prefetching strength, physical data layout awareness, and timely prefetching. To validate SoAP, we implement a prototype by modifying the software redundant arrays of inexpensive disks (RAID) under Linux. Experimental results demonstrate that SoAP can consistently offer improved average response time and throughput to the parallel disk system under non-random workloads compared with STEP, SP, ASP, and Linux-like SEQPs.展开更多
Docker,as a mainstream container solution,adopts the Copy-on-Write(CoW)mechanism in its storage drivers.This mechanism satisfies the need of different containers to share the same image.However,when a single container...Docker,as a mainstream container solution,adopts the Copy-on-Write(CoW)mechanism in its storage drivers.This mechanism satisfies the need of different containers to share the same image.However,when a single container performs operations such as modification of an image file,a duplicate is created in the upper readwrite layer,which contributes to the runtime overhead.When the accessed image file is fairly large,this additional overhead becomes non-negligible.Here we present the concept of Dynamic Prefetching Strategy Optimization(DPSO),which optimizes the Co W mechanism for a Docker container on the basis of the dynamic prefetching strategy.At the beginning of the container life cycle,DPSO pre-copies up the image files that are most likely to be copied up later to eliminate the overhead caused by performing this operation during application runtime.The experimental results show that DPSO has an average prefetch accuracy of greater than 78%in complex scenarios and could effectively eliminate the overhead caused by the CoW mechanism.展开更多
In peer-to-peer (P2P) video-on-demand (VoD) streaming systems, each peer contributes a fixed amount of hard disk storage (usually 2 GB) to store viewed videos and then uploads them to other requesting peers. How...In peer-to-peer (P2P) video-on-demand (VoD) streaming systems, each peer contributes a fixed amount of hard disk storage (usually 2 GB) to store viewed videos and then uploads them to other requesting peers. However, the daily hits (namely popularity) of different segments of a video is highly diverse, which means that taking the whole video as the basic storage unit may lead to redundancy of unpopular segment replicas and scarcity of popular segment replicas in the P2P storage network. To address this issue, we propose a video slicing mechanism (VSM) in which the whole video is sliced into small blocks (20 MB, for instance). Under VSM, peers can moderately remove unpopular blocks from and accordingly add popular ones into their contributed hard disk storage, which increases the usage of peers' contributed resource (storage and bandwidth). To reasonably assign bandwidth among peers with different download capacity, we propose a moderate prefetching strategy (MPS) based on VSM. Under MPS, when the amount of prefetched content reaches the predefined threshold, peers immediately stop prefetching video content and then release occupied bandwidth for others. A stochastic model is established to analyze the performance of the MPS and it is found that perfect playback continuity can be got under MPS. Then the MPS is applied to PPLive VoD system (one of the largest P2P VoD systems in China) and measurement results demonstrate that low server load and perfect user satisfaction can be achieved. Also, the server bandwidth contribution of PPLive VoD system under MPS (namely 5%) is much lower than that of UUSee VoD system (namely 30%).展开更多
基金This work is supported by‘The Fundamental Research Funds for the Central Universities(Grant No.HIT.NSRIF.201714)’‘Weihai Science and Technology Development Program(2016DXGJMS15)’‘Key Research and Development Program in Shandong Provincial(2017GGX90103)’.
文摘In distributed storage systems,file access efficiency has an important impact on the real-time nature of information forensics.As a popular approach to improve file accessing efficiency,prefetching model can fetches data before it is needed according to the file access pattern,which can reduce the I/O waiting time and increase the system concurrency.However,prefetching model needs to mine the degree of association between files to ensure the accuracy of prefetching.In the massive small file situation,the sheer volume of files poses a challenge to the efficiency and accuracy of relevance mining.In this paper,we propose a massive files prefetching model based on LSTM neural network with cache transaction strategy to improve file access efficiency.Firstly,we propose a file clustering algorithm based on temporal locality and spatial locality to reduce the computational complexity.Secondly,we propose a definition of cache transaction according to files occurrence in cache instead of time-offset distance based methods to extract file block feature accurately.Lastly,we innovatively propose a file access prediction algorithm based on LSTM neural network which predict the file that have high possibility to be accessed.Experiments show that compared with the traditional LRU and the plain grouping methods,the proposed model notably increase the cache hit rate and effectively reduces the I/O wait time.
基金the National Natural Science Foundation of China(No.61602525,No.61572525)the Research Foundation of Education Bureau of Hunan Province of China(No.19C1391)the Natural Science Foundation of Hunan Province of China(No.2020JJ5775)。
文摘With the number of connected devices increasing rapidly,the access latency issue increases drastically in the edge cloud environment.Massive low time-constrained and data-intensive mobile applications require efficient replication strategies to decrease retrieval time.However,the determination of replicas is not reasonable in many previous works,which incurs high response delay.To this end,a correlation-aware replica prefetching(CRP)strategy based on the file correlation principle is proposed,which can prefetch the files with high access probability.The key is to determine and obtain the implicit high-value files effectively,which has a significant impact on the performance of CRP.To achieve the goal of accelerating the acquisition of implicit highvalue files,an access rule management method based on consistent hashing is proposed,and then the storage and query mechanisms for access rules based on adjacency list storage structure are further presented.The theoretical analysis and simulation results corroborate that CRP shortens average response time over 4.8%,improves average hit ratio over 4.2%,reduces transmitting data amount over 8.3%,and maintains replication frequency at a reasonable level when compared to other schemes.
文摘A novel approach that integrates occlusion culling within the view-dependent rendering framework is proposed. The algorithm uses the prioritized-layered projection(PLP) algorithm to occlude those obscured objects, and uses an approximate visibility technique to accurately and efficiently determine which objects will be visible in the coming future and prefetch those objects from disk before they are rendered, view-dependent rendering technique provides the ability to change level of detail over the surface seamlessly and smoothly in real-time according to cell solidity value.
文摘In this paper, a prefetching technique is proposed to solve the performance problem caused by remote data access delay. In the technique, the map tasks which will cause the delay are predicted first and then the input data of these tasks will be preloaded before the tasks are scheduled. During the execution, the input data can be read from local nodes. Therefore, the delay can be hidden. The technique has been implemented in Hadoop-0. 20.1. The experiment results have shown that the technique reduces map tasks causing delay, and improves the performance of Hadoop MapRe- duce by 20%.
文摘In this paper, we present a comparative study between informed and predictive prefetching mechanisms that were presented to leverage the performance gap between I/O storage systems and CPU. In particular, we will focus on transparent informed prefetching (TIP) and predictive prefetching using probability graph approach (PG). Our main objective is to show the main features, motivations, and implementation overview of each mechanism. We also conducted a performance evaluation discussion that shows a comparison between both mechanisms performance when using different cache size values.
文摘In this paper, we present a predictive prefetching mechanism that is based on probability graph approach to perform prefetching between different levels in a parallel hybrid storage system. The fundamental concept of our approach is to invoke parallel hybrid storage system’s parallelism and prefetch data among multiple storage levels (e.g. solid state disks, and hard disk drives) in parallel with the application’s on-demand I/O reading requests. In this study, we show that a predictive prefetching across multiple storage levels is an efficient technique for placing near future needed data blocks in the uppermost levels near the application. Our PPHSS approach extends previous ideas of predictive prefetching in two ways: (1) our approach reduces applications’ execution elapsed time by keeping data blocks that are predicted to be accessed in the near future cached in the uppermost level;(2) we propose a parallel data fetching scheme in which multiple fetching mechanisms (i.e. predictive prefetching and application’s on-demand data requests) can work in parallel;where the first one fetches data blocks among the different levels of the hybrid storage systems (i.e. low-level (slow) to high-level (fast) storage devices) and the other one fetches the data from the storage system to the application. Our PPHSS strategy integrated with the predictive prefetching mechanism significantly reduces overall I/O access time in a hybrid storage system. Finally, we developed a simulator to evaluate the performance of the proposed predictive prefetching scheme in the context of hybrid storage systems. Our results show that our PPHSS can improve system performance by 4% across real-world I/O traces without the need of using large size caches.
文摘A scheme for high-speed data transfer via the Internet for Web service in an extremely large delay environment is proposed. With the wide-spread use of Internet services in recent years, WLAN Internet service in high-speed trains has commenced. The system for this is composed of a satellite communication system between the train and the ground station, which is characterized by extremely large latency of several hundred milliseconds due to long propagation latency. High-speed web access is not available to users in a train in such an extremely large latency network system. Thus, a prefetch scheme for performance acceleration of Web services in this environment is proposed. A test-bed system that implements the proposed scheme is implemented and is its performance in this test-bed is evaluated. The proposed scheme is verified to enable high-speed Web access in the extremely large delay environment compared to conventional schemes.
文摘With the speed gap between storage system access and processor computing, end-to-end data processing has become a bottleneck to improve the total performance of computer systems over the Internet. Based on the analysis of data processing behavior, an adaptive cache organization scheme is proposed with fast address calculation. This scheme can make full use of the characteristics of stack space data access, adopt fast address calculation strategy, and reduce the hit time of stack access. Adaptively, the stack cache can be turned off from beginning to end, when a stack overflow occurs to avoid the effect of stack switching on processor performance. Also, through the instruction cache and the failure behavior for the data cache, a prefetching policy is developed, which is combined with the data capture of the failover queue state. Finally, the proposed method can maintain the order of instruction and data access, which facilitates the extraction of prefetching in the end-to-end data processing.
基金supported in part by the National Science Foundation of USA under Grant Nos.EIA-0224377,CNS-0406328,CNS-0509118,and CCF-0621435.
文摘Data prefetching is an effective data access latency hiding technique to mask the CPU stall caused by cache misses and to bridge the performance gap between processor and memory. With hardware and/or software support, data prefetching brings data closer to a processor before it is actually needed. Many prefetching techniques have been developed for single-core processors. Recent developments in processor technology have brought multicore processors into mainstream. While some of the single-core prefetching techniques are directly applicable to multicore processors, numerous novel strategies have been proposed in the past few years to take advantage of multiple cores. This paper aims to provide a comprehensive review of the state-of-the-art prefetching techniques, and proposes a taxonomy that classifies various design concerns in developing a prefetching strategy, especially for multicore processors. We compare various existing methods through analysis as well.
基金Supported by the National Natural Science Foundation of China under Grant No. 60472044. The authors would like to thank research fellow Dr. Yun Shi of China State Post Bureau and Professor Dr. Jun Zou of Tsinghua University for their helpful and constructive comments.
文摘The World Wide Web has become the primary means for information dissemination. Due to the limited resources of the network bandwidth, users always suffer from long time waiting. Web prefetching and web caching are the primary approaches to reducing the user perceived access latency and improving the quality of services. In this paper, a Stochastic Petri Nets (SPN) based integrated web prefetching and caching model (IWPCM) is presented and the performance evaluation of IWPCM is made. The performance metrics, access latency, throughput, HR (hit ratio) and BHR (byte hit ratio) are analyzed and discussed. Simulations show that compared with caching only model (CM), IWPCM can further improve the throughput, HR and BHR efficiently and reduce the access latency. The performance evaluation based on the SPN model can provide a basis for implementation of web prefetching and caching and the combination of web prefetching and caching holds the promise of improving the QoS of web systems.
基金supported by a grant from HP Lab China,and the National Natural Science Foundation of China under Grant Nos.60496325 and 60573092
文摘As the speed gap between main memory and modern processors continues to widen, the cache behavior becomes more important for main memory database systems (MMDBs). Indexing technique is a key component of MMDBs. Unfortunately, the predominant indexes -B^+-trees and T-trees -- have been shown to utilize cache poorly, which triggers the development of many cache-conscious indexes, such as CSB^+-trees and pB^+-trees. Most of these cache-conscious indexes are variants of conventional B^+-trees, and have better cache performance than B^+-trees. In this paper, we develop a novel J^+-tree index, inspired by the Judy structure which is an associative array data structure, and propose a more cacheoptimized index -- Prefetching J^+-tree (pJ^+-tree), which applies prefetching to J^+-tree to accelerate range scan operations. The J^+-tree stores all the keys in its leaf nodes and keeps the reference values of leaf nodes in a Judy structure, which makes J^+-tree not only hold the advantages of Judy (such as fast single value search) but also outperform it in other aspects. For example, J^+-trees can achieve better performance on range queries than Judy. The pJ^+-tree index exploits prefetching techniques to further improve the cache behavior of J^+-trees and yields a speedup of 2.0 on range scans. Compared with B^+-trees, CSB^+-trees, pB^+-trees and T-trees, our extensive experimental Study shows that pJ^+-trees can provide better performance on both time (search, scan, update) and space aspects.
基金This work is funded by the National Science Foundation of USA under Grants Nos.OCI-1835764 and CSR-1814872。
文摘Modern High-Performance Computing(HPC)systems are adding extra layers to the memory and storage hierarchy,named deep memory and storage hierarchy(DMSH),to increase I/O performance.New hardware technologies,such as NVMe and SSD,have been introduced in burst buffer installations to reduce the pressure for external storage and boost the burstiness of modern I/O systems.The DMSH has demonstrated its strength and potential in practice.However,each layer of DMSH is an independent heterogeneous system and data movement among more layers is significantly more complex even without considering heterogeneity.How to efficiently utilize the DMSH is a subject of research facing the HPC community.Further,accessing data with a high-throughput and low-latency is more imperative than ever.Data prefetching is a well-known technique for hiding read latency by requesting data before it is needed to move it from a high-latency medium(e.g.,disk)to a low-latency one(e.g.,main memory).However,existing solutions do not consider the new deep memory and storage hierarchy and also suffer from under-utilization of prefetching resources and unnecessary evictions.Additionally,existing approaches implement a client-pull model where understanding the application's I/O behavior drives prefetching decisions.Moving towards exascale,where machines run multiple applications concurrently by accessing files in a workflow,a more data-centric approach resolves challenges such as cache pollution and redundancy.In this paper,we present the design and implementation of Hermes:a new,heterogeneous-aware,multi-tiered,dynamic,and distributed I/O buffering system.Hermes enables,manages,supervises,and,in some sense,extends I/O buffering to fully integrate into the DMSH.We introduce three novel data placement policies to efficiently utilize all layers and we present three novel techniques to perform memory,metadata,and communication management in hierarchical buffering systems.Additionally,we demonstrate the benefits of a truly hierarchical data prefetcher that adopts a server-push approach to data prefetching.Our evaluation shows that,in addition to automatic data movement through the hierarchy,Hermes can significantly accelerate I/O and outperforms by more than 2x state-of-the-art buffering platforms.Lastly,results show 10%-35%performance gains over existing prefetchers and over 50%when compared to systems with no prefetching.
基金the National Natural Science Foundation of China (No.60073018).
文摘A major overhead in software DSM (Distributed Shared Memory) is the cost of remote memory accesses necessitated by the protocol as well as induced by false sharing. This paper introduces a dynamic prefetching method implemented in the JIAJIA software DSM to reduce system overhead caused by remote accesses. The prefetching method records the interleaving string of INV (invalidation) and GETP (getting a remote page) operations for each cached page and analyzes the periodicity of the string when a page is invalidated on a lock or barrier. A prefetching request is issued after the lock or barrier if the periodicity analysis indicates that GETP will be the next operation in the string. Multiple prefetching requests are merged into the same message if they are to the same host. Performance evaluation with eight well-accepted benchmarks in a cluster of sixteen PowerPC workstations shows that the prefetching scheme can significantly reduce the page fault overhead and as a result achieves a performance increase of 15%-20% in three benchmarks and around 8%-10% in another three. The average extra traffic caused by useless prefetches is only 7%-13% in the evaluation.
基金Project supported by the National Natural Science Foundation of China(Nos.61272141,61120106005,and 61303068)the National High-Tech R&D Program of China(No.2012AA01A301)
文摘As the scale of supercomputers rapidly grows, the reliability problem dominates the system availability. Existing fault tolerance mechanisms, such as periodic checkpointing and process redundancy, cannot effectively fix this problem. To address this issue, we present a new fault tolerance framework using process replication and prefetching (FTRP), combining the benefits of proactive and reactive mechanisms. FTRP incorporates a novel cost model and a new proactive fault tolerance mechanism to improve the application execution efficiency. The novel cost model, called the 'work-most' (WM) model, makes runtime decisions to adaptively choose an action from a set of fault tolerance mechanisms based on failure prediction results and application status. Similar to program locality, we observe the failure locality phenomenon in supercomputers for the first time. In the new proactive fault tolerance mechanism, process replication with process prefetching is proposed based on the failure locality, significantly avoiding losses caused by the failures regardless of whether they have been predicted. Simulations with real failure traces demonstrate that the FTRP framework outperforms existing fault tolerance mechanisms with up to 10% improvement in application efficiency for common failure prediction accuracy, and is effective for petascale systems and beyond.
基金the National Natural Science Foundation of China under Grant Nos.60673146,60603049,60736012,and 60703017the National High Technology Development 863 Program of China under Grant No.2006AA010201 and No.2007AA01Z114the National Basic Research Program of China under Grant No.2005CB321601.
文摘Stride prefetching is recognized as an important technique to improve memory access performance. The prior work usually profiles and/or analyzes the program behavior offline, and uses the identified stride patterns to guide the compilation process by injecting the prefetch instructions at appropriate places. There are some researches trying to enable stride prefetching in runtime systems with online profiling, but they either cannot discover cross-procedural prefetch opportunity, or require special supports in hardware or garbage collection. In this paper, we present a prefetch engine for JVM (Java Virtual Machine). It firstly identifies the candidate load operations during just-in-time (JIT) compilation, and then instruments the compiled code to profile the addresses of those loads. The runtime profile is collected in a trace buffer, which triggers a prefetch controller upon a protection fault. The prefetch controller analyzes the trace to discover any stride patterns, then modifies the compiled code to inject the prefetch instructions in place of the instrumentations. One of the major advantages of this engine is that, it can detect striding loads in any virtual code places for both regular and irregular code, not being limited with plain loop or procedure scopes. Actually we found the cross-procedural patterns take about 30% of all the prefetchings in the representative Java benchmarks. Another major advantage of the engine is that it has runtime overhead much smaller (the maximal is less than 4.0%) than the benefits it brings. Our evaluation with Apache Harmony JVM shows that the engine can achieve an average 6.2% speed-up with SPECJVM98 and DaCapo on Intel Pentium 4 platform, in spite of the runtime overhead.
基金supported by the National Basic Research Program(973) of China (No. 2011CB302303)the National Natural Science Foundation of China (No. 60933002)the Fundamental Research Funds for the Central Universities,China (Nos.2012QN100 and 2011TUS-136)
文摘Sequential prefetching schemes are widely employed in storage servers to mask disk latency and improve system throughput. However, existing schemes cannot benefit parallel disk systems as expected due to the fact that they ignore the distinct internal characteristics of the parallel disk system, in particular, data striping. Moreover, their aggressive prefetching pattern suffers from premature evictions and prolonged request latencies. In this paper, we propose a strip-oriented asynchronous prefetching (SoAP) technique, which is dedicated to the parallel disk system. It settles the above-mentioned problems by providing multiple novel features, e.g., enhanced prediction accuracy, adaptive prefetching strength, physical data layout awareness, and timely prefetching. To validate SoAP, we implement a prototype by modifying the software redundant arrays of inexpensive disks (RAID) under Linux. Experimental results demonstrate that SoAP can consistently offer improved average response time and throughput to the parallel disk system under non-random workloads compared with STEP, SP, ASP, and Linux-like SEQPs.
基金supported by the National Key Research and Development Program of China(No.2018YFB1003203)the National Natural Science Foundation of China(Nos.61772218 and 61433019)+1 种基金the Outstanding Youth Foundation of Hubei Province(No.2016CFA032)the Chinese Universities Scientific Fund(No.2019kfyRCPY030)。
文摘Docker,as a mainstream container solution,adopts the Copy-on-Write(CoW)mechanism in its storage drivers.This mechanism satisfies the need of different containers to share the same image.However,when a single container performs operations such as modification of an image file,a duplicate is created in the upper readwrite layer,which contributes to the runtime overhead.When the accessed image file is fairly large,this additional overhead becomes non-negligible.Here we present the concept of Dynamic Prefetching Strategy Optimization(DPSO),which optimizes the Co W mechanism for a Docker container on the basis of the dynamic prefetching strategy.At the beginning of the container life cycle,DPSO pre-copies up the image files that are most likely to be copied up later to eliminate the overhead caused by performing this operation during application runtime.The experimental results show that DPSO has an average prefetch accuracy of greater than 78%in complex scenarios and could effectively eliminate the overhead caused by the CoW mechanism.
基金supported by the National Basic Research Program of China (2007CB307101)the National Natural Science Foundation of China (60672069,60772043)
文摘In peer-to-peer (P2P) video-on-demand (VoD) streaming systems, each peer contributes a fixed amount of hard disk storage (usually 2 GB) to store viewed videos and then uploads them to other requesting peers. However, the daily hits (namely popularity) of different segments of a video is highly diverse, which means that taking the whole video as the basic storage unit may lead to redundancy of unpopular segment replicas and scarcity of popular segment replicas in the P2P storage network. To address this issue, we propose a video slicing mechanism (VSM) in which the whole video is sliced into small blocks (20 MB, for instance). Under VSM, peers can moderately remove unpopular blocks from and accordingly add popular ones into their contributed hard disk storage, which increases the usage of peers' contributed resource (storage and bandwidth). To reasonably assign bandwidth among peers with different download capacity, we propose a moderate prefetching strategy (MPS) based on VSM. Under MPS, when the amount of prefetched content reaches the predefined threshold, peers immediately stop prefetching video content and then release occupied bandwidth for others. A stochastic model is established to analyze the performance of the MPS and it is found that perfect playback continuity can be got under MPS. Then the MPS is applied to PPLive VoD system (one of the largest P2P VoD systems in China) and measurement results demonstrate that low server load and perfect user satisfaction can be achieved. Also, the server bandwidth contribution of PPLive VoD system under MPS (namely 5%) is much lower than that of UUSee VoD system (namely 30%).