In the field of supercomputing, one key issue for scal-able shared-memory multiprocessors is the design of the directory which denotes the sharing state for a cache block. A good direc-tory design intends to achieve t...In the field of supercomputing, one key issue for scal-able shared-memory multiprocessors is the design of the directory which denotes the sharing state for a cache block. A good direc-tory design intends to achieve three key attributes: reasonable memory overhead, sharer position precision and implementation complexity. However, researchers often face the problem that gain-ing one attribute may result in losing another. The paper proposes an elastic pointer directory (EPD) structure based on the analysis of shared-memory applications, taking the fact that the number of sharers for each directory entry is typical y smal . Analysis re-sults show that for 4 096 nodes, the ratio of memory overhead to the ful-map directory is 2.7%. Theoretical analysis and cycle-accurate execution-driven simulations on a 16 and 64-node cache coherence non uniform memory access (CC-NUMA) multiproces-sor show that the corresponding pointer overflow probability is reduced significantly. The performance is observed to be better than that of a limited pointers directory and almost identical to the ful-map directory, except for the slight implementation complex-ity. Using the directory cache to explore directory access locality is also studied. The experimental result shows that this is a promis-ing approach to be used in the state-of-the-art high performance computing domain.展开更多
Delay optimization has recently attracted signif-icant attention. However, few studies have focused on the delay optimization of mixed-polarity Reed-Muller (MPRM) logic circuits. In this paper, we propose an efficient...Delay optimization has recently attracted signif-icant attention. However, few studies have focused on the delay optimization of mixed-polarity Reed-Muller (MPRM) logic circuits. In this paper, we propose an efficient delay op-timization approach (EDOA) for MPRM logic circuits under the unit delay model, which can derive an optimal MPRM logic circuit with minimum delay. First, the simplest MPRM expression with the fewest number of product terms is ob-tained using a novel Reed-Muller expression simplification approach (RMESA) considering don't-care terms. Second, a minimum delay decomposition approach based on a Huffman tree construction algorithm is utilized on the simplest MPRM expression. Experimental results on MCNC benchmark cir-cuits demonstrate that compared to the Berkeley SIS 1.2 and ABC, the EDOA can significantly reduce delay for most cir-cuits. Furthermore, for a few circuits, while reducing delay, the EDOA incurs an area penalty.展开更多
基金supported by the National Natural Science Foundation of China(6123200961370059)+1 种基金the High Technology Research and Development Program of China(863 Program)(2011AA01A205)the Fund of the State Key Laboratory of Software Development Environment(SKLSDE2012ZX06)
文摘In the field of supercomputing, one key issue for scal-able shared-memory multiprocessors is the design of the directory which denotes the sharing state for a cache block. A good direc-tory design intends to achieve three key attributes: reasonable memory overhead, sharer position precision and implementation complexity. However, researchers often face the problem that gain-ing one attribute may result in losing another. The paper proposes an elastic pointer directory (EPD) structure based on the analysis of shared-memory applications, taking the fact that the number of sharers for each directory entry is typical y smal . Analysis re-sults show that for 4 096 nodes, the ratio of memory overhead to the ful-map directory is 2.7%. Theoretical analysis and cycle-accurate execution-driven simulations on a 16 and 64-node cache coherence non uniform memory access (CC-NUMA) multiproces-sor show that the corresponding pointer overflow probability is reduced significantly. The performance is observed to be better than that of a limited pointers directory and almost identical to the ful-map directory, except for the slight implementation complex-ity. Using the directory cache to explore directory access locality is also studied. The experimental result shows that this is a promis-ing approach to be used in the state-of-the-art high performance computing domain.
基金This work was supported by the National Natural Science Foundation of China (Grant Nos. 61370059 and 61232009)Beijing Natural Science Foundation (4152030), Fundamental Research Funds for the Central Universities (YWF-15-GJSYS-085, YWF-14-JSJXY-14)+1 种基金Open Project Program of National Engineering Research Center for Science & Technology Resources Sharing Service (Beihang University), the fund of the State Key Laboratory of Computer Architecture (CARCH201507)the fund of the State Key Laboratory of Software Development Environment (SKLSDE-2016ZX-13).
文摘Delay optimization has recently attracted signif-icant attention. However, few studies have focused on the delay optimization of mixed-polarity Reed-Muller (MPRM) logic circuits. In this paper, we propose an efficient delay op-timization approach (EDOA) for MPRM logic circuits under the unit delay model, which can derive an optimal MPRM logic circuit with minimum delay. First, the simplest MPRM expression with the fewest number of product terms is ob-tained using a novel Reed-Muller expression simplification approach (RMESA) considering don't-care terms. Second, a minimum delay decomposition approach based on a Huffman tree construction algorithm is utilized on the simplest MPRM expression. Experimental results on MCNC benchmark cir-cuits demonstrate that compared to the Berkeley SIS 1.2 and ABC, the EDOA can significantly reduce delay for most cir-cuits. Furthermore, for a few circuits, while reducing delay, the EDOA incurs an area penalty.