Limited main memory bandwidth is becoming a fundamental performance bottleneck in chipmultiprocessor (CMP) design. Yet directly increasing the peak memory bandwidth can incur high cost and power consumption. In this...Limited main memory bandwidth is becoming a fundamental performance bottleneck in chipmultiprocessor (CMP) design. Yet directly increasing the peak memory bandwidth can incur high cost and power consumption. In this paper, we address this problem by proposing a memory, a bandwidth-aware reconfigurable cache hierarchy, BACH, with hybrid memory technologies. Components of our BACH design include a hybrid cache hierarchy, a reconfiguration mechanism, and a statistical prediction engine. Our hybrid cache hierarchy chooses different memory technologies with various bandwidth characteristics, such as spin-transfer torque memory (STT-MRAM), resistive memory (ReRAM), and embedded DRAM (eDRAM), to configure each level so that the peak bandwidth of the overall cache hierarchy is optimized. Our reconfiguration mechanism can dynamically adjust the cache capacity of each level based on the predicted bandwidth demands of running workloads. The bandwidth prediction is performed by our prediction engine. We evaluate the system performance gain obtained by BACH design with a set of multithreaded and multiprogrammed workloads with and without the limitation of system power budget. Compared with traditional SRAM-based cache design, BACH improves the system throughput by 58% and 14% with multithreaded and multiprogrammed workloads respectively.展开更多
In this paper,a hybrid cache placement scheme for multihop wireless service networks is proposed. In this scheme,hot nodes in data transferring path are mined up by means of rout-ing navigation graph,and whole network...In this paper,a hybrid cache placement scheme for multihop wireless service networks is proposed. In this scheme,hot nodes in data transferring path are mined up by means of rout-ing navigation graph,and whole network is covered with network clustering scheme. A hot node has been chosen for cache place-ment in each cluster,and the nodes within a cluster access cache data with no more than two hops. The cache placement scheme reduces data access latency and workload of the server node. It also reduces the average length of data transferring,which means that fewer nodes are involved. The network system energy con-sumption decreased as involved relay nodes reduced. The per-formance analysis shows that the scheme achieves significant system performance improvement in network environment,with a large number of nodes.展开更多
Our study introduces a novel distributed query plan refinement phase in an enhanced architecture of distributed query processing engine (DQPE). Query plan refinement generates potentially efficient distributed query...Our study introduces a novel distributed query plan refinement phase in an enhanced architecture of distributed query processing engine (DQPE). Query plan refinement generates potentially efficient distributed query plan by reusable aggregate query shipping (RAQS) approach. The approach improves response time at the cost of pre-processing time. If the overheads could not be compensated by query results reusage, RAQS is no more favorable. Therefore a globM cost estimation model is employed to get proper operators: RR_Agg, R_Agg, or R_Scan. For the purpose of reusing results of queries with aggregate function in distributed query processing, a multi-level hybrid view caching (HVC) scheme is introduced. The scheme retains the advantages of partial match and aggregate query results caching. By our solution, evaluations with distributed TPC-H queries show significant improvement on average response time.展开更多
文摘Limited main memory bandwidth is becoming a fundamental performance bottleneck in chipmultiprocessor (CMP) design. Yet directly increasing the peak memory bandwidth can incur high cost and power consumption. In this paper, we address this problem by proposing a memory, a bandwidth-aware reconfigurable cache hierarchy, BACH, with hybrid memory technologies. Components of our BACH design include a hybrid cache hierarchy, a reconfiguration mechanism, and a statistical prediction engine. Our hybrid cache hierarchy chooses different memory technologies with various bandwidth characteristics, such as spin-transfer torque memory (STT-MRAM), resistive memory (ReRAM), and embedded DRAM (eDRAM), to configure each level so that the peak bandwidth of the overall cache hierarchy is optimized. Our reconfiguration mechanism can dynamically adjust the cache capacity of each level based on the predicted bandwidth demands of running workloads. The bandwidth prediction is performed by our prediction engine. We evaluate the system performance gain obtained by BACH design with a set of multithreaded and multiprogrammed workloads with and without the limitation of system power budget. Compared with traditional SRAM-based cache design, BACH improves the system throughput by 58% and 14% with multithreaded and multiprogrammed workloads respectively.
基金Supported by the National Basic Research Program of China (973 Program)(2004CB318201)National High Technology Research and Development Program of China (863 Program)(2008AA01A402) Program for Changjiang Scholars and Innovative Research Team in University of China (IRT0725)
文摘In this paper,a hybrid cache placement scheme for multihop wireless service networks is proposed. In this scheme,hot nodes in data transferring path are mined up by means of rout-ing navigation graph,and whole network is covered with network clustering scheme. A hot node has been chosen for cache place-ment in each cluster,and the nodes within a cluster access cache data with no more than two hops. The cache placement scheme reduces data access latency and workload of the server node. It also reduces the average length of data transferring,which means that fewer nodes are involved. The network system energy con-sumption decreased as involved relay nodes reduced. The per-formance analysis shows that the scheme achieves significant system performance improvement in network environment,with a large number of nodes.
基金partially supported by the National Basic Research 973 Program of China under Grant No. 2005CB321807the National High Technology Rresearch and Development 863 Program of China under Grant Nos. 2006AA01A106 and 2006AA04Z158.
文摘Our study introduces a novel distributed query plan refinement phase in an enhanced architecture of distributed query processing engine (DQPE). Query plan refinement generates potentially efficient distributed query plan by reusable aggregate query shipping (RAQS) approach. The approach improves response time at the cost of pre-processing time. If the overheads could not be compensated by query results reusage, RAQS is no more favorable. Therefore a globM cost estimation model is employed to get proper operators: RR_Agg, R_Agg, or R_Scan. For the purpose of reusing results of queries with aggregate function in distributed query processing, a multi-level hybrid view caching (HVC) scheme is introduced. The scheme retains the advantages of partial match and aggregate query results caching. By our solution, evaluations with distributed TPC-H queries show significant improvement on average response time.