Data prefetching is an effective data access latency hiding technique to mask the CPU stall caused by cache misses and to bridge the performance gap between processor and memory. With hardware and/or software support,...Data prefetching is an effective data access latency hiding technique to mask the CPU stall caused by cache misses and to bridge the performance gap between processor and memory. With hardware and/or software support, data prefetching brings data closer to a processor before it is actually needed. Many prefetching techniques have been developed for single-core processors. Recent developments in processor technology have brought multicore processors into mainstream. While some of the single-core prefetching techniques are directly applicable to multicore processors, numerous novel strategies have been proposed in the past few years to take advantage of multiple cores. This paper aims to provide a comprehensive review of the state-of-the-art prefetching techniques, and proposes a taxonomy that classifies various design concerns in developing a prefetching strategy, especially for multicore processors. We compare various existing methods through analysis as well.展开更多
An effective method to detect stepping-stone intrusion(SSI)is to estimate the length of a connection chain.This type of detection method is referred to as a network-based detection approach.Existing network-based SSI ...An effective method to detect stepping-stone intrusion(SSI)is to estimate the length of a connection chain.This type of detection method is referred to as a network-based detection approach.Existing network-based SSI detection methods are either ineffective in the context of the Internet because of the presence of outliers in the packet round-trip times(RTTs)or inefficient,as many packets must be captured and processed.Because of the high fluctuation caused by the intermediate routers on the Internet,it is unavoidable that the RTTs of the captured packets contain outlier values.In this paper,we first propose an efficient algorithm to eliminate most of the possible RTT outliers of the packets captured in the Internet environment.We then develop an efficient SSI detection algorithm by mining network traffic using an improved version of k-Means clustering.Our proposed detection algorithm for SSI is accurate,effective,and efficient in the context of the Internet.Well-designed network experiments are conducted in the Internet environment to verify the effectiveness,correctness,and efficiency of our proposed algorithms.Our experiments show that the effective rate of our proposed SSI detection algorithm is higher than 85.7%in the context of the Internet.展开更多
Data access delay has become the prominent performance bottleneck of high-end computing systems. The key to reducing data access delay in system design is to diminish data stall time. Memory locality and concurrency a...Data access delay has become the prominent performance bottleneck of high-end computing systems. The key to reducing data access delay in system design is to diminish data stall time. Memory locality and concurrency are the two essential factors influencing the performance of modern memory systems. However, existing studies in reducing data stall time rarely focus on utilizing data access concurrency because the impact of memory concurrency on overall memory system performance is not well understood. In this study, a pair of novel data stall time models, the L-C model for the combined effort of locality and concurrency and the P-M model for the effect of pure miss on data stall time, are presented. The models provide a new understanding of data access delay and provide new directions for performance optimization. Based on these new models, a summary table of advanced cache optimizations is presented. It has 38 entries contributed by data concurrency while only has 21 entries contributed by data locality, which shows the value of data concurrency. The L-C and P-M models and their associated results and opportunities introduced in this study are important and necessary for future data-centric architecture and algorithm design of modern computing systems.展开更多
Accesses Per Cycle(APC),Concurrent Average Memory Access Time(C-AMAT),and Layered Performance Matching(LPM)are three memory performance models that consider both data locality and memory assess concurrency.The APC mod...Accesses Per Cycle(APC),Concurrent Average Memory Access Time(C-AMAT),and Layered Performance Matching(LPM)are three memory performance models that consider both data locality and memory assess concurrency.The APC model measures the throughput of a memory architecture and therefore reflects the quality of service(QoS)of a memory system.The C-AMAT model provides a recursive expression for the memory access delay and therefore can be used for identifying the potential bottlenecks in a memory hierarchy.The LPM method transforms a global memory system optimization into localized optimizations at each memory layer by matching the data access demands of the applications with the underlying memory system design.These three models have been proposed separately through prior efforts.This paper reexamines the three models under one coherent mathematical framework.More specifically,we present a new memorycentric view of data accesses.We divide the memory cycles at each memory layer into four distinct categories and use them to recursively define the memory access latency and concurrency along the memory hierarchy.This new perspective offers new insights with a clear formulation of the memory performance considering both locality and concurrency.Consequently,the performance model can be easily understood and applied in engineering practices.As such,the memory-centric approach helps establish a unified mathematical foundation for model-driven performance analysis and optimization of contemporary and future memory systems.展开更多
A wide range of applications for wireless ad hoc networks are time-critical and impose stringent requirement on the communication latency. One of the key communication operations is to broadcast a message from a sourc...A wide range of applications for wireless ad hoc networks are time-critical and impose stringent requirement on the communication latency. One of the key communication operations is to broadcast a message from a source node. This paper studies the minimum latency broadcast scheduling problem in wireless ad hoc networks under collision-free transmission model. The previously best known algorithm for this NP-hard problem produces a broadcast schedule whose latency is at least 648(rmax/rmin)^2 times that of the optimal schedule, where rmax and rmin are the maximum and minimum transmission ranges of nodes in a network, respectively. We significantly improve this result by proposing a new scheduling algorithm whose approximation performance ratio is at most (1 + 2rmax/rmin)^2+32, Moreover, under the proposed scheduling each node just needs to forward a message at most once.展开更多
Recent convergence of information communications technology and sensing equipment is creating new de- mands and opportunities for wireless sensor networks without technological restrictions, such as cyber- physical sy...Recent convergence of information communications technology and sensing equipment is creating new de- mands and opportunities for wireless sensor networks without technological restrictions, such as cyber- physical systems and internet of things. The fast-growing number of wireless sensor networks, the variety of sensors, the different granularity of time control in cyber-physical systems,展开更多
A number of mobile Online Social Networking (OSN) services have appeared in the market in recent times. While most mobile systems benefit greatly from cloud services, centralized servers and communications infrastru...A number of mobile Online Social Networking (OSN) services have appeared in the market in recent times. While most mobile systems benefit greatly from cloud services, centralized servers and communications infrastructure is not always available. Nor are location-based services offered to mobile devices without GPS. To take advantage of cloud and to address these problems, a Wi-Fi based multihop networking system called MoNet is proposed. On top of MONET we propose a privacy-aware geosocial networking service called WiFace. Where there is no infrastructure, a distributed content sharing protocol significantly shortens the relay path, reduces conflicts, and improves data availability. Furthermore, a security mechanism is developed to protect privacy. Comprehensive experiments performed on MoNet show that the system is more than sufficient to support social networking and even audio and video applications.展开更多
Graph processing is a vital component of many AI and big data applications.However,due to its poor locality and complex data access patterns,graph processing is also a known performance killer of AI and big data appli...Graph processing is a vital component of many AI and big data applications.However,due to its poor locality and complex data access patterns,graph processing is also a known performance killer of AI and big data applications.In this work,we propose to enhance graph processing applications by leveraging fine-grained memory access patterns with a dual-path architecture on top of existing software-based graph optimizations.We first identify that memory accesses to the offset,edge,and state array have distinct locality and impact on performance.We then introduce the Skyway architecture,which consists of two primary components:1)a dedicated direct data path between the core and memory to transfer state array elements efficiently,and 2)a data-type aware fine-grained memory-side row buffer hardware for both the newly designed direct data path and the regular memory hierarchy data path.The proposed Skyway architecture is able to improve the overall performance by reducing the memory access interference and improving data access efficiency with a minimal overhead.We evaluate Skyway on a set of diverse algorithms using large real-world graphs.On a simulated fourcore system,Skyway improves the performance by 23%on average over the best-performing graph-specialized hardware optimizations.展开更多
Task scheduling is an integrated component of computing. With the emergence of Grid and ubiquitous computing, new challenges appear in task scheduling based on properties such as security, quality of service, and lack...Task scheduling is an integrated component of computing. With the emergence of Grid and ubiquitous computing, new challenges appear in task scheduling based on properties such as security, quality of service, and lack of central control within distributed administrative domains. A Grid task scheduling framework must be able to deal with these issues. One of the goals of Grid task scheduling is to achieve high system throughput while matching applications with the available computing resources. This matching of resources in a non-deterministically shared heterogeneous environment leads to concerns over Quality of Service (QoS). In this paper a novel QoS guided task scheduling algorithm for Grid computing is introduced. The proposed novel algorithm is based on a general adaptive scheduling heuristics that includes QoS guidance.The algorithm is evaluated within a simulated Grid environment. The experimental results show that the new QoS guided Min-Min heuristic can lead to significant performance gain for a variety of applications. The approach is compared with others based on the quality of the prediction formulated by inaccurate information.展开更多
Data access delay is a major bottleneck in utilizing current high-end computing (HEC) machines. Prefetching, where data is fetched before CPU demands for it, has been considered as an effective solution to masking d...Data access delay is a major bottleneck in utilizing current high-end computing (HEC) machines. Prefetching, where data is fetched before CPU demands for it, has been considered as an effective solution to masking data access delay. However, current client-initiated prefetching strategies, where a computing processor initiates prefetching instructions, have many limitations. They do not work well for applications with complex, non-contiguous data access patterns. While technology advances continue to increase the gap between computing and data access performance, trading computing power for reducing data access delay has become a natural choice. In this paper, we present a serverbased data-push approach and discuss its associated implementation mechanisms. In the server-push architecture, a dedicated server called Data Push Server (DPS) initiates and proactively pushes data closer to the client in time. Issues, such as what data to fetch, when to fetch, and how to push are studied. The SimpleScalar simulator is modified with a dedicated prefetching engine that pushes data for another processor to test DPS based prefetching. Simulation results show that L1 Cache miss rate can be reduced by up to 97% (71% on average) over a superscalar processor for SPEC CPU2000 benchmarks that have high cache miss rates.展开更多
Modern High-Performance Computing(HPC)systems are adding extra layers to the memory and storage hierarchy,named deep memory and storage hierarchy(DMSH),to increase I/O performance.New hardware technologies,such as NVM...Modern High-Performance Computing(HPC)systems are adding extra layers to the memory and storage hierarchy,named deep memory and storage hierarchy(DMSH),to increase I/O performance.New hardware technologies,such as NVMe and SSD,have been introduced in burst buffer installations to reduce the pressure for external storage and boost the burstiness of modern I/O systems.The DMSH has demonstrated its strength and potential in practice.However,each layer of DMSH is an independent heterogeneous system and data movement among more layers is significantly more complex even without considering heterogeneity.How to efficiently utilize the DMSH is a subject of research facing the HPC community.Further,accessing data with a high-throughput and low-latency is more imperative than ever.Data prefetching is a well-known technique for hiding read latency by requesting data before it is needed to move it from a high-latency medium(e.g.,disk)to a low-latency one(e.g.,main memory).However,existing solutions do not consider the new deep memory and storage hierarchy and also suffer from under-utilization of prefetching resources and unnecessary evictions.Additionally,existing approaches implement a client-pull model where understanding the application's I/O behavior drives prefetching decisions.Moving towards exascale,where machines run multiple applications concurrently by accessing files in a workflow,a more data-centric approach resolves challenges such as cache pollution and redundancy.In this paper,we present the design and implementation of Hermes:a new,heterogeneous-aware,multi-tiered,dynamic,and distributed I/O buffering system.Hermes enables,manages,supervises,and,in some sense,extends I/O buffering to fully integrate into the DMSH.We introduce three novel data placement policies to efficiently utilize all layers and we present three novel techniques to perform memory,metadata,and communication management in hierarchical buffering systems.Additionally,we demonstrate the benefits of a truly hierarchical data prefetcher that adopts a server-push approach to data prefetching.Our evaluation shows that,in addition to automatic data movement through the hierarchy,Hermes can significantly accelerate I/O and outperforms by more than 2x state-of-the-art buffering platforms.Lastly,results show 10%-35%performance gains over existing prefetchers and over 50%when compared to systems with no prefetching.展开更多
Smart grid has integrated an increasing number of distributed energy resources to improve the efficiency and flexibility of power generation and consumption as well as the resilience of the power grid.The energy consu...Smart grid has integrated an increasing number of distributed energy resources to improve the efficiency and flexibility of power generation and consumption as well as the resilience of the power grid.The energy consumers on the power grid,e.g.,households,equipped with distributed energy resources can be considered as"microgrids"that both generate and consume electricity.In this paper,we study the energy community discovery problems which identify energy communities for the microgrids to facilitate energy management,e.g.,load balancing,energy sharing and trading on the grid.Specifically,we present efficient algorithms to discover such communities of microgrids considering both their geo-locations and net energy(NE)over any period.Finally,we experimentally validate the performance of the algorithms using both synthetic and real datasets.展开更多
We study the problem of efficient data aggregation in unreliable wireless sensor networks by designing a fault tolerant data aggregation protocol. A fault tolerant data aggregation protocol consists of two parts: bas...We study the problem of efficient data aggregation in unreliable wireless sensor networks by designing a fault tolerant data aggregation protocol. A fault tolerant data aggregation protocol consists of two parts: basic aggregation scheduling and amendment strategies. On default, data is aggregated according to the basic aggregation scheduling strategy. The amendment strategy will start automatically when a middle sensor node is out of service. We focus our attention on the amendment strategies and assume that the network adopts a connected dominating set (CDS) based aggregation scheduling as its basic aggregation scheduling strategy. The amendment scheme includes localized aggregation tree repairing algorithms and distributed rescheduling algorithms. The former are used to find a new aggregation tree for every child of the corrupted node, whereas the latter are used to achieve interference free data aggregation scheduling after the amendment. These amendment strategies impact only a very limited number of nodes near the corrupted node and the amendment process is transparent to all the other nodes. Theoretical analyses and simulations show that the scheme greatly improves the efficiency of the data aggregation operation by reducing both message and time costs compared to rebuilding the aggregation tree and rescheduling the en- tire network.展开更多
Numerous indoor localization techniques have been proposed recently to meet the intensive demand for location- based service (LBS). Among them, the most popular solutions are the Wi-Fi fingerprint-based approaches. ...Numerous indoor localization techniques have been proposed recently to meet the intensive demand for location- based service (LBS). Among them, the most popular solutions are the Wi-Fi fingerprint-based approaches. The core challenge is to lower the cost of fingerprint site-survey. One of the trends is to collect the piecewise data from clients and establish the radio map in crowdsourcing manner. However the low participation rate blocks the practical use. In this work, we propose a passive crowdsourcing channel state information (CSI) based indoor localization scheme, C2IL. Despite a crowdsourcing based approach, our scheme is totally transparent to the client and the only requirement is to connect to our 802.11n access points (APs). C2IL is built upon an innovative method to accurately estimate the moving speed solely based on 802.11n CSI. Knowing the walking speed of a client and its surrounding APs, a graph matching algorithm is employed to extract the received signal strength (RSS) fingerprints and establish the fingerprint map. For localization phase, we design a trajectory clustering based localization algorithm to provide precise real-time indoor localization and tracking. We develop and deploy a practical working system of C2IL in a large office environment. Extensive evaluations indicate that the error of speed estimation is within 3%, and the localization error is within 2 m at 80% time in a very complex indoor environment.展开更多
A common method of prolonging the lifetime of wireless sensor networks is to use low power duty cycling protocol. Existing protocols consist of two categories: sender-initiated and receiver-initiated. In this paper, ...A common method of prolonging the lifetime of wireless sensor networks is to use low power duty cycling protocol. Existing protocols consist of two categories: sender-initiated and receiver-initiated. In this paper, we present SA- MAC, a self-stabilizing adaptive MAC protocol for wireless sensor networks. SA-MAC dynamically adjusts the transmission time-slot, waking up time-slot, and packet detection pattern according to current network working condition, such as packet length and wake-up patterns of neighboring nodes. In the long run, every sensor node will find its own transmission phase so that the network will enter a stable stage when the network load and qualities axe static. We conduct extensive experiments to evaluate the energy consumption, packet reception rate of SA-MAC in real sensor networking systems. Our results indicate that SA-MAC outperforms other existing protocols.展开更多
With the surge of big data applications and the worsening of the memory-wall problem,the memory system,instead of the computing unit,becomes the commonly recognized major concern of computing.However,this“memorycent...With the surge of big data applications and the worsening of the memory-wall problem,the memory system,instead of the computing unit,becomes the commonly recognized major concern of computing.However,this“memorycentric”common understanding has a humble beginning.More than three decades ago,the memory-bounded speedup model is the first model recognizing memory as the bound of computing and provided a general bound of speedup and a computing-memory trade-off formulation.The memory-bounded model was well received even by then.It was immediately introduced in several advanced computer architecture and parallel computing textbooks in the 1990’s as a must-know for scalable computing.These include Prof.Kai Hwang’s book“Scalable Parallel Computing”in which he introduced the memory-bounded speedup model as the Sun-Ni’s Law,parallel with the Amdahl’s Law and the Gustafson’s Law.Through the years,the impacts of this model have grown far beyond parallel processing and into the fundamental of computing.In this article,we revisit the memory-bounded speedup model and discuss its progress and impacts in depth to make a unique contribution to this special issue,to stimulate new solutions for big data applications,and to promote data-centric thinking and rethinking.展开更多
Providing each node with one or more multi-channel radios offers a promising avenue for enhancing the network capacity by simultaneously exploiting multiple non-overlapping channels through different radio interfaces ...Providing each node with one or more multi-channel radios offers a promising avenue for enhancing the network capacity by simultaneously exploiting multiple non-overlapping channels through different radio interfaces and mitigating interferences through proper channel assignment. However, it is quite challenging to effectively utilize multiple channels and/or multiple radios to maximize throughput capacity. The National Natural Science Foundation of China(NSFC) Project61128005 conducted comprehensive algorithmic-theoretic and queuing-theoretic studies of maximizing wireless networking capacity in multi-channel multi-radio(MC-MR) wireless networks under the protocol interference model and fundamentally advanced the state of the art. In addition, under the notoriously hard physical interference model, this project has taken initial algorithmic studies on maximizing the network capacity, with or without power control. We expect the new techniques and tools developed in this project will have wide applications in capacity planning, resource allocation and sharing, and protocol design for wireless networks, and will serve as the basis for future algorithm developments in wireless networks with advanced features, such as multi-input multi-output(MIMO) wireless networks.展开更多
Diagnosis is of great importance to wireless sensor networks due to the nature of error prone sensor nodes and unreliable wireless links. The state-of-the-art diagnostic tools focus on certain types of faults, and the...Diagnosis is of great importance to wireless sensor networks due to the nature of error prone sensor nodes and unreliable wireless links. The state-of-the-art diagnostic tools focus on certain types of faults, and their performances are highly correlated with the networks they work with. The network administrators feel difficult in measuring the effectiveness of their diagnosis approaches and choosing appropriate tools so as to meet the reliability demand. In this work, we introduce the D-vector to characterize the property of a diagnosis approach. The D-vector has five dimensions, namely the degree of coupling, the granularity, the overhead, the tool reliability and the network reliability, quantifying and evaluating the effectiveness of current diagnostic tools in certain applications. We employ a skyline query algorithm to find out the most effective diagnosis approaches, i.e., skyline points(SPs), from five dimensions of all potential D-vectors. The selected skyline D-vector points can further guide the design of various diagnosis approaches. In our trace-driven simulations, we design and select tailored diagnostic tools for GreenOrbs, achieving high performance with relatively low overhead.展开更多
基金supported in part by the National Science Foundation of USA under Grant Nos.EIA-0224377,CNS-0406328,CNS-0509118,and CCF-0621435.
文摘Data prefetching is an effective data access latency hiding technique to mask the CPU stall caused by cache misses and to bridge the performance gap between processor and memory. With hardware and/or software support, data prefetching brings data closer to a processor before it is actually needed. Many prefetching techniques have been developed for single-core processors. Recent developments in processor technology have brought multicore processors into mainstream. While some of the single-core prefetching techniques are directly applicable to multicore processors, numerous novel strategies have been proposed in the past few years to take advantage of multiple cores. This paper aims to provide a comprehensive review of the state-of-the-art prefetching techniques, and proposes a taxonomy that classifies various design concerns in developing a prefetching strategy, especially for multicore processors. We compare various existing methods through analysis as well.
基金supported by the the National Centers of Academic Excellence in Cybersecurity(NCAE-C)Grant(No.H98230-20-1-0293)at the National Security Agency with Columbus State University,Georgia,USA。
文摘An effective method to detect stepping-stone intrusion(SSI)is to estimate the length of a connection chain.This type of detection method is referred to as a network-based detection approach.Existing network-based SSI detection methods are either ineffective in the context of the Internet because of the presence of outliers in the packet round-trip times(RTTs)or inefficient,as many packets must be captured and processed.Because of the high fluctuation caused by the intermediate routers on the Internet,it is unavoidable that the RTTs of the captured packets contain outlier values.In this paper,we first propose an efficient algorithm to eliminate most of the possible RTT outliers of the packets captured in the Internet environment.We then develop an efficient SSI detection algorithm by mining network traffic using an improved version of k-Means clustering.Our proposed detection algorithm for SSI is accurate,effective,and efficient in the context of the Internet.Well-designed network experiments are conducted in the Internet environment to verify the effectiveness,correctness,and efficiency of our proposed algorithms.Our experiments show that the effective rate of our proposed SSI detection algorithm is higher than 85.7%in the context of the Internet.
基金The work was supported in part by the National Science Foundation of USA under Grant Nos. CNS-1162540, CCF-0937877, and CNS-0751200. We would like to thank the Scalable Computing Software (SCS) group in the Illi- nois Institute of Technology and anonymous reviewers for their valuable and professional comments on earlier drafts of this work.
文摘Data access delay has become the prominent performance bottleneck of high-end computing systems. The key to reducing data access delay in system design is to diminish data stall time. Memory locality and concurrency are the two essential factors influencing the performance of modern memory systems. However, existing studies in reducing data stall time rarely focus on utilizing data access concurrency because the impact of memory concurrency on overall memory system performance is not well understood. In this study, a pair of novel data stall time models, the L-C model for the combined effort of locality and concurrency and the P-M model for the effect of pure miss on data stall time, are presented. The models provide a new understanding of data access delay and provide new directions for performance optimization. Based on these new models, a summary table of advanced cache optimizations is presented. It has 38 entries contributed by data concurrency while only has 21 entries contributed by data locality, which shows the value of data concurrency. The L-C and P-M models and their associated results and opportunities introduced in this study are important and necessary for future data-centric architecture and algorithm design of modern computing systems.
基金supported in part by the U.S.National Science Foundation under Grant Nos.CCF-2008000,CNS-1730488,and CCF-2008907the U.S.Department of Homeland Security under Grant No.2017-ST-062-000002.
文摘Accesses Per Cycle(APC),Concurrent Average Memory Access Time(C-AMAT),and Layered Performance Matching(LPM)are three memory performance models that consider both data locality and memory assess concurrency.The APC model measures the throughput of a memory architecture and therefore reflects the quality of service(QoS)of a memory system.The C-AMAT model provides a recursive expression for the memory access delay and therefore can be used for identifying the potential bottlenecks in a memory hierarchy.The LPM method transforms a global memory system optimization into localized optimizations at each memory layer by matching the data access demands of the applications with the underlying memory system design.These three models have been proposed separately through prior efforts.This paper reexamines the three models under one coherent mathematical framework.More specifically,we present a new memorycentric view of data accesses.We divide the memory cycles at each memory layer into four distinct categories and use them to recursively define the memory access latency and concurrency along the memory hierarchy.This new perspective offers new insights with a clear formulation of the memory performance considering both locality and concurrency.Consequently,the performance model can be easily understood and applied in engineering practices.As such,the memory-centric approach helps establish a unified mathematical foundation for model-driven performance analysis and optimization of contemporary and future memory systems.
基金Supported by the National Natural Science Foundation of China(No.10531070,No.10771209,No.10721101)Chinese Academy of Sciences under Grant No.kjcx-yw-s7
文摘A wide range of applications for wireless ad hoc networks are time-critical and impose stringent requirement on the communication latency. One of the key communication operations is to broadcast a message from a source node. This paper studies the minimum latency broadcast scheduling problem in wireless ad hoc networks under collision-free transmission model. The previously best known algorithm for this NP-hard problem produces a broadcast schedule whose latency is at least 648(rmax/rmin)^2 times that of the optimal schedule, where rmax and rmin are the maximum and minimum transmission ranges of nodes in a network, respectively. We significantly improve this result by proposing a new scheduling algorithm whose approximation performance ratio is at most (1 + 2rmax/rmin)^2+32, Moreover, under the proposed scheduling each node just needs to forward a message at most once.
文摘Recent convergence of information communications technology and sensing equipment is creating new de- mands and opportunities for wireless sensor networks without technological restrictions, such as cyber- physical systems and internet of things. The fast-growing number of wireless sensor networks, the variety of sensors, the different granularity of time control in cyber-physical systems,
基金supported by National Natural Science Foundation of China under Grant No. 90818021, and 9071803National Natural Science Foundation of China under Grant No. 60828003+3 种基金supported by Tsinghua National Laboratory for Information Science and Technology(TNList)NSF CNS0832120National Basic Research Program of China ("973"Program) under grant No. 2010CB328100the National High Technology Research and Development Program of China ("863"Program) under grant No. 2007AA01Z180
文摘A number of mobile Online Social Networking (OSN) services have appeared in the market in recent times. While most mobile systems benefit greatly from cloud services, centralized servers and communications infrastructure is not always available. Nor are location-based services offered to mobile devices without GPS. To take advantage of cloud and to address these problems, a Wi-Fi based multihop networking system called MoNet is proposed. On top of MONET we propose a privacy-aware geosocial networking service called WiFace. Where there is no infrastructure, a distributed content sharing protocol significantly shortens the relay path, reduces conflicts, and improves data availability. Furthermore, a security mechanism is developed to protect privacy. Comprehensive experiments performed on MoNet show that the system is more than sufficient to support social networking and even audio and video applications.
基金supported in part by the U.S.National Science Foundation under Grant Nos.CCF-2008907 and CCF-2029014the Chinese Academy of Sciences Project for Young Scientists in Basic Research under Grant No.YSBR-029the Chinese Academy of Sciences Project for Youth Innovation Promotion Association.
文摘Graph processing is a vital component of many AI and big data applications.However,due to its poor locality and complex data access patterns,graph processing is also a known performance killer of AI and big data applications.In this work,we propose to enhance graph processing applications by leveraging fine-grained memory access patterns with a dual-path architecture on top of existing software-based graph optimizations.We first identify that memory accesses to the offset,edge,and state array have distinct locality and impact on performance.We then introduce the Skyway architecture,which consists of two primary components:1)a dedicated direct data path between the core and memory to transfer state array elements efficiently,and 2)a data-type aware fine-grained memory-side row buffer hardware for both the newly designed direct data path and the regular memory hierarchy data path.The proposed Skyway architecture is able to improve the overall performance by reducing the memory access interference and improving data access efficiency with a minimal overhead.We evaluate Skyway on a set of diverse algorithms using large real-world graphs.On a simulated fourcore system,Skyway improves the performance by 23%on average over the best-performing graph-specialized hardware optimizations.
文摘Task scheduling is an integrated component of computing. With the emergence of Grid and ubiquitous computing, new challenges appear in task scheduling based on properties such as security, quality of service, and lack of central control within distributed administrative domains. A Grid task scheduling framework must be able to deal with these issues. One of the goals of Grid task scheduling is to achieve high system throughput while matching applications with the available computing resources. This matching of resources in a non-deterministically shared heterogeneous environment leads to concerns over Quality of Service (QoS). In this paper a novel QoS guided task scheduling algorithm for Grid computing is introduced. The proposed novel algorithm is based on a general adaptive scheduling heuristics that includes QoS guidance.The algorithm is evaluated within a simulated Grid environment. The experimental results show that the new QoS guided Min-Min heuristic can lead to significant performance gain for a variety of applications. The approach is compared with others based on the quality of the prediction formulated by inaccurate information.
基金This research was supported in part by the National Science Foundation of U.S.A.under NSF Grant Nos. EIA-0224377,CNS-0406328,CNS-0509118,and CCF-0621435.
文摘Data access delay is a major bottleneck in utilizing current high-end computing (HEC) machines. Prefetching, where data is fetched before CPU demands for it, has been considered as an effective solution to masking data access delay. However, current client-initiated prefetching strategies, where a computing processor initiates prefetching instructions, have many limitations. They do not work well for applications with complex, non-contiguous data access patterns. While technology advances continue to increase the gap between computing and data access performance, trading computing power for reducing data access delay has become a natural choice. In this paper, we present a serverbased data-push approach and discuss its associated implementation mechanisms. In the server-push architecture, a dedicated server called Data Push Server (DPS) initiates and proactively pushes data closer to the client in time. Issues, such as what data to fetch, when to fetch, and how to push are studied. The SimpleScalar simulator is modified with a dedicated prefetching engine that pushes data for another processor to test DPS based prefetching. Simulation results show that L1 Cache miss rate can be reduced by up to 97% (71% on average) over a superscalar processor for SPEC CPU2000 benchmarks that have high cache miss rates.
基金This work is funded by the National Science Foundation of USA under Grants Nos.OCI-1835764 and CSR-1814872。
文摘Modern High-Performance Computing(HPC)systems are adding extra layers to the memory and storage hierarchy,named deep memory and storage hierarchy(DMSH),to increase I/O performance.New hardware technologies,such as NVMe and SSD,have been introduced in burst buffer installations to reduce the pressure for external storage and boost the burstiness of modern I/O systems.The DMSH has demonstrated its strength and potential in practice.However,each layer of DMSH is an independent heterogeneous system and data movement among more layers is significantly more complex even without considering heterogeneity.How to efficiently utilize the DMSH is a subject of research facing the HPC community.Further,accessing data with a high-throughput and low-latency is more imperative than ever.Data prefetching is a well-known technique for hiding read latency by requesting data before it is needed to move it from a high-latency medium(e.g.,disk)to a low-latency one(e.g.,main memory).However,existing solutions do not consider the new deep memory and storage hierarchy and also suffer from under-utilization of prefetching resources and unnecessary evictions.Additionally,existing approaches implement a client-pull model where understanding the application's I/O behavior drives prefetching decisions.Moving towards exascale,where machines run multiple applications concurrently by accessing files in a workflow,a more data-centric approach resolves challenges such as cache pollution and redundancy.In this paper,we present the design and implementation of Hermes:a new,heterogeneous-aware,multi-tiered,dynamic,and distributed I/O buffering system.Hermes enables,manages,supervises,and,in some sense,extends I/O buffering to fully integrate into the DMSH.We introduce three novel data placement policies to efficiently utilize all layers and we present three novel techniques to perform memory,metadata,and communication management in hierarchical buffering systems.Additionally,we demonstrate the benefits of a truly hierarchical data prefetcher that adopts a server-push approach to data prefetching.Our evaluation shows that,in addition to automatic data movement through the hierarchy,Hermes can significantly accelerate I/O and outperforms by more than 2x state-of-the-art buffering platforms.Lastly,results show 10%-35%performance gains over existing prefetchers and over 50%when compared to systems with no prefetching.
基金partially supported by the National Science Foundation(NSF)(No.CNS-1745894)the WISER ISFG grant+1 种基金partly sponsored by the Air Force Office of Scientific Research(AFOSR)(No.YIP FA9550-17-1-0240)the Maryland Procurement Office(No.H98230-18-D-0007).
文摘Smart grid has integrated an increasing number of distributed energy resources to improve the efficiency and flexibility of power generation and consumption as well as the resilience of the power grid.The energy consumers on the power grid,e.g.,households,equipped with distributed energy resources can be considered as"microgrids"that both generate and consume electricity.In this paper,we study the energy community discovery problems which identify energy communities for the microgrids to facilitate energy management,e.g.,load balancing,energy sharing and trading on the grid.Specifically,we present efficient algorithms to discover such communities of microgrids considering both their geo-locations and net energy(NE)over any period.Finally,we experimentally validate the performance of the algorithms using both synthetic and real datasets.
基金Supported by the National Basic Research and Development (973) Program of China (No. 2010CB334707)the National Natural Science Foundation of China (No. 60903167)+1 种基金the Zhejiang Provincial Natural Science Foundation (Nos. Y1111063 and Y1101336)the Zhejiang Provincial Key Innovative Research Team
文摘We study the problem of efficient data aggregation in unreliable wireless sensor networks by designing a fault tolerant data aggregation protocol. A fault tolerant data aggregation protocol consists of two parts: basic aggregation scheduling and amendment strategies. On default, data is aggregated according to the basic aggregation scheduling strategy. The amendment strategy will start automatically when a middle sensor node is out of service. We focus our attention on the amendment strategies and assume that the network adopts a connected dominating set (CDS) based aggregation scheduling as its basic aggregation scheduling strategy. The amendment scheme includes localized aggregation tree repairing algorithms and distributed rescheduling algorithms. The former are used to find a new aggregation tree for every child of the corrupted node, whereas the latter are used to achieve interference free data aggregation scheduling after the amendment. These amendment strategies impact only a very limited number of nodes near the corrupted node and the amendment process is transparent to all the other nodes. Theoretical analyses and simulations show that the scheme greatly improves the efficiency of the data aggregation operation by reducing both message and time costs compared to rebuilding the aggregation tree and rescheduling the en- tire network.
基金supported by the National Natural Science Foundation of China under Grant Nos.61325013,61190112,61170216,and 61228202the Natural Science Foundation of USA under Grant Nos.CNS-0832120,CNS-1035894,ECCS-1247944,and ECCS-1343306+1 种基金the Fundamental Research Funds for the Central Universities of China under Project No.2012jdgz02(Xi’an Jiaotong University)the Research Fund for the Doctoral Program of Higher Education of China under Project No.20130201120016
文摘Numerous indoor localization techniques have been proposed recently to meet the intensive demand for location- based service (LBS). Among them, the most popular solutions are the Wi-Fi fingerprint-based approaches. The core challenge is to lower the cost of fingerprint site-survey. One of the trends is to collect the piecewise data from clients and establish the radio map in crowdsourcing manner. However the low participation rate blocks the practical use. In this work, we propose a passive crowdsourcing channel state information (CSI) based indoor localization scheme, C2IL. Despite a crowdsourcing based approach, our scheme is totally transparent to the client and the only requirement is to connect to our 802.11n access points (APs). C2IL is built upon an innovative method to accurately estimate the moving speed solely based on 802.11n CSI. Knowing the walking speed of a client and its surrounding APs, a graph matching algorithm is employed to extract the received signal strength (RSS) fingerprints and establish the fingerprint map. For localization phase, we design a trajectory clustering based localization algorithm to provide precise real-time indoor localization and tracking. We develop and deploy a practical working system of C2IL in a large office environment. Extensive evaluations indicate that the error of speed estimation is within 3%, and the localization error is within 2 m at 80% time in a very complex indoor environment.
基金partially supported by National Science Foundation of USA under Grant Nos.CNS-0832120,CNS-1035894,ECCS-1247944,ECCS-1343306the National Natural Science Foundation of China under Grant Nos.61170216 and 61228202supported in part by the National Science Foundation of USA under Grant Nos.CNS-1319915 and CNS-1343355
文摘A common method of prolonging the lifetime of wireless sensor networks is to use low power duty cycling protocol. Existing protocols consist of two categories: sender-initiated and receiver-initiated. In this paper, we present SA- MAC, a self-stabilizing adaptive MAC protocol for wireless sensor networks. SA-MAC dynamically adjusts the transmission time-slot, waking up time-slot, and packet detection pattern according to current network working condition, such as packet length and wake-up patterns of neighboring nodes. In the long run, every sensor node will find its own transmission phase so that the network will enter a stable stage when the network load and qualities axe static. We conduct extensive experiments to evaluate the energy consumption, packet reception rate of SA-MAC in real sensor networking systems. Our results indicate that SA-MAC outperforms other existing protocols.
基金supported in part by the U.S.National Science Foundation under Grant Nos.CCF-2029014 and CCF-2008907.
文摘With the surge of big data applications and the worsening of the memory-wall problem,the memory system,instead of the computing unit,becomes the commonly recognized major concern of computing.However,this“memorycentric”common understanding has a humble beginning.More than three decades ago,the memory-bounded speedup model is the first model recognizing memory as the bound of computing and provided a general bound of speedup and a computing-memory trade-off formulation.The memory-bounded model was well received even by then.It was immediately introduced in several advanced computer architecture and parallel computing textbooks in the 1990’s as a must-know for scalable computing.These include Prof.Kai Hwang’s book“Scalable Parallel Computing”in which he introduced the memory-bounded speedup model as the Sun-Ni’s Law,parallel with the Amdahl’s Law and the Gustafson’s Law.Through the years,the impacts of this model have grown far beyond parallel processing and into the fundamental of computing.In this article,we revisit the memory-bounded speedup model and discuss its progress and impacts in depth to make a unique contribution to this special issue,to stimulate new solutions for big data applications,and to promote data-centric thinking and rethinking.
基金supported in part by the National Natural Science Foundation of China under Grant No.61128005
文摘Providing each node with one or more multi-channel radios offers a promising avenue for enhancing the network capacity by simultaneously exploiting multiple non-overlapping channels through different radio interfaces and mitigating interferences through proper channel assignment. However, it is quite challenging to effectively utilize multiple channels and/or multiple radios to maximize throughput capacity. The National Natural Science Foundation of China(NSFC) Project61128005 conducted comprehensive algorithmic-theoretic and queuing-theoretic studies of maximizing wireless networking capacity in multi-channel multi-radio(MC-MR) wireless networks under the protocol interference model and fundamentally advanced the state of the art. In addition, under the notoriously hard physical interference model, this project has taken initial algorithmic studies on maximizing the network capacity, with or without power control. We expect the new techniques and tools developed in this project will have wide applications in capacity planning, resource allocation and sharing, and protocol design for wireless networks, and will serve as the basis for future algorithm developments in wireless networks with advanced features, such as multi-input multi-output(MIMO) wireless networks.
基金supported by the National Natural Science Foundation of China under Grant Nos.61190110,61325013,61103187,61170213,61228202,61170216,and 61422207the National Basic Research 973 Program of China under Grant No.2014CB347800+2 种基金the Natural Science Foundation of USA under Grant Nos.CNS-0832120,CNS-1035894,ECCS-1247944,and ECCS-1343306the Fundamental Research Funds for the Central Universities of China under Project No.2012jdgz02(Xi’an Jiaotong University)the Research Fund for the Doctoral Program of Higher Education of China under Project No.20130201120016
文摘Diagnosis is of great importance to wireless sensor networks due to the nature of error prone sensor nodes and unreliable wireless links. The state-of-the-art diagnostic tools focus on certain types of faults, and their performances are highly correlated with the networks they work with. The network administrators feel difficult in measuring the effectiveness of their diagnosis approaches and choosing appropriate tools so as to meet the reliability demand. In this work, we introduce the D-vector to characterize the property of a diagnosis approach. The D-vector has five dimensions, namely the degree of coupling, the granularity, the overhead, the tool reliability and the network reliability, quantifying and evaluating the effectiveness of current diagnostic tools in certain applications. We employ a skyline query algorithm to find out the most effective diagnosis approaches, i.e., skyline points(SPs), from five dimensions of all potential D-vectors. The selected skyline D-vector points can further guide the design of various diagnosis approaches. In our trace-driven simulations, we design and select tailored diagnostic tools for GreenOrbs, achieving high performance with relatively low overhead.