期刊文献+
共找到6篇文章
< 1 >
每页显示 20 50 100
Elastic pointer directory organization for scalable shared memory multiprocessors
1
作者 Yuhang Liu Mingfa Zhu Limin Xiao 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2014年第1期158-167,共10页
In the field of supercomputing, one key issue for scal-able shared-memory multiprocessors is the design of the directory which denotes the sharing state for a cache block. A good direc-tory design intends to achieve t... In the field of supercomputing, one key issue for scal-able shared-memory multiprocessors is the design of the directory which denotes the sharing state for a cache block. A good direc-tory design intends to achieve three key attributes: reasonable memory overhead, sharer position precision and implementation complexity. However, researchers often face the problem that gain-ing one attribute may result in losing another. The paper proposes an elastic pointer directory (EPD) structure based on the analysis of shared-memory applications, taking the fact that the number of sharers for each directory entry is typical y smal . Analysis re-sults show that for 4 096 nodes, the ratio of memory overhead to the ful-map directory is 2.7%. Theoretical analysis and cycle-accurate execution-driven simulations on a 16 and 64-node cache coherence non uniform memory access (CC-NUMA) multiproces-sor show that the corresponding pointer overflow probability is reduced significantly. The performance is observed to be better than that of a limited pointers directory and almost identical to the ful-map directory, except for the slight implementation complex-ity. Using the directory cache to explore directory access locality is also studied. The experimental result shows that this is a promis-ing approach to be used in the state-of-the-art high performance computing domain. 展开更多
关键词 DIRECTORY scalabUity memory overhead positioningprecision OVERFLOW cache coherence non uniform memory access(CC-NUMA).
下载PDF
A Unified Buffering Management with Set Divisible Cache for PCM Main Memory
2
作者 Mei-Ying Bian Su-Kyung Yoon Jeong-Geun Kim Sangjae Nam Shin-Dug Kim 《Journal of Computer Science & Technology》 SCIE EI CSCD 2016年第1期137-146,共10页
This research proposes a phase-change memory (PCM) based main memory system with an effective combi- nation of a superblock-based adaptive buffering structure and its associated set divisible last-level cache (LLC... This research proposes a phase-change memory (PCM) based main memory system with an effective combi- nation of a superblock-based adaptive buffering structure and its associated set divisible last-level cache (LLC). To achieve high performance similar to that of dynamic random-access memory (DRAM) based main memory, the superblock-based adaptive buffer (SABU) is comprised of dual DRAM buffers, i.e., an aggressive superblock-based pre-fetching buffer (SBPB) and an adaptive sub-block reusing buffer (SBRB), and a set divisible LLC based on a cache space optimization scheme. According to our experiment, the longer PCM access latency can typically be hidden using our proposed SABU, which can significantly reduce the number of writes over the PCM main memory by 26.44%. The SABU approach can reduce PCM access latency up to 0.43 times, compared with conventional DRAM main memory. Meanwhile, the average memory energy consumption can be reduced by 19.7%. 展开更多
关键词 memory hierarchy memory structure cache memory buffer management
原文传递
Next High Performance and Low Power Flash Memory Package Structure
3
作者 Jung-Hoon Lee 《Journal of Computer Science & Technology》 SCIE EI CSCD 2007年第4期515-520,共6页
In general, NAND flash memory has advantages in low power consumption, storage capacity, and fast erase/write performance in contrast to NOR flash. But, main drawback of the NAND flash memory is the slow access time f... In general, NAND flash memory has advantages in low power consumption, storage capacity, and fast erase/write performance in contrast to NOR flash. But, main drawback of the NAND flash memory is the slow access time for random read operations. Therefore, we proposed the new NAND flash memory package for overcoming this major drawback. We present a high performance and low power NAND flash memory system with a dual cache memory. The proposed NAND flash package consists of two parts, i.e., an NAND flash memory module, and a dual cache module. The new NAND flash memory system can achieve dramatically higher performance and lower power consumption compared with any conventionM NAND-type flash memory module. Our results show that the proposed system can reduce about 78% of write operations into the flash memory cell and about 70% of read operations from the flash memory cell by using only additional 3KB cache space. This value represents high potential to achieve low power consumption and high performance gain. 展开更多
关键词 flash memory NAND-type NOR-type memory localities buffer or cache memory
原文传递
The Value of a Small Microkernel for Dreamy Memory and the RAMpage Memory Hierarchy
4
作者 Philip Machanick 《Journal of Computer Science & Technology》 SCIE EI CSCD 2005年第5期586-595,共10页
This paper explores potential for the RAMpage memory hierarchy to use a microkernel with a small memory footprint, in a specialized cache-speed static RAM (tightly-coupled memory, TCM). Dreamy memory is DRAM kept in... This paper explores potential for the RAMpage memory hierarchy to use a microkernel with a small memory footprint, in a specialized cache-speed static RAM (tightly-coupled memory, TCM). Dreamy memory is DRAM kept in low-power mode, unless referenced. Simulations show that a small microkernel suits RAMpage well, in that it achieves significantly better speed and energy gains than a standard hierarchy from adding TCM. RAMpage, in its best 128KB L2 case, gained 11% speed using TCM, and reduced energy 14%. Equivalent conventional hierarchy gains were under 1%. While 1MB L2 was significantly faster against lower-energy cases for the smaller L2, the larger SRAM's energy does not justify the speed gain. Using a 128KB L2 cache in a conventional architecture resulted in a best-case overall run time of 2.58s, compared with the best dreamy mode run time (RAMpage without context switches on misses) of 3.34s, a speed penalty of 29%. Energy in the fastest 128KB L2 case was 2.18J vs. 1.50J, a reduction of 31%. The same RAMpage configuration without dreamy mode took 2.83s as simulated, and used 2.393, an acceptable trade-off (penalty under 10%) for being able to switch easily to a lower-energy mode. 展开更多
关键词 low-power design main memory virtual memory cache memories microkernels
原文传递
Server-Based Data Push Architecture for Multi-Processor Environments 被引量:3
5
作者 孙贤和 Surendra Byna 陈勇 《Journal of Computer Science & Technology》 SCIE EI CSCD 2007年第5期641-652,共12页
Data access delay is a major bottleneck in utilizing current high-end computing (HEC) machines. Prefetching, where data is fetched before CPU demands for it, has been considered as an effective solution to masking d... Data access delay is a major bottleneck in utilizing current high-end computing (HEC) machines. Prefetching, where data is fetched before CPU demands for it, has been considered as an effective solution to masking data access delay. However, current client-initiated prefetching strategies, where a computing processor initiates prefetching instructions, have many limitations. They do not work well for applications with complex, non-contiguous data access patterns. While technology advances continue to increase the gap between computing and data access performance, trading computing power for reducing data access delay has become a natural choice. In this paper, we present a serverbased data-push approach and discuss its associated implementation mechanisms. In the server-push architecture, a dedicated server called Data Push Server (DPS) initiates and proactively pushes data closer to the client in time. Issues, such as what data to fetch, when to fetch, and how to push are studied. The SimpleScalar simulator is modified with a dedicated prefetching engine that pushes data for another processor to test DPS based prefetching. Simulation results show that L1 Cache miss rate can be reduced by up to 97% (71% on average) over a superscalar processor for SPEC CPU2000 benchmarks that have high cache miss rates. 展开更多
关键词 performance measurement evaluation MODELING simulation of multiple-processor system cache memory
原文传递
System Architecture of Godson-3 Multi-Core Processors 被引量:7
6
作者 高翔 陈云霁 +2 位作者 王焕东 唐丹 胡伟武 《Journal of Computer Science & Technology》 SCIE EI CSCD 2010年第2期181-191,共11页
Godson-3 is the latest generation of Godson microprocessor family. It takes a scalable multi-core architecture with hardware support for accelerating applications including X86 emulation and signal processing. This pa... Godson-3 is the latest generation of Godson microprocessor family. It takes a scalable multi-core architecture with hardware support for accelerating applications including X86 emulation and signal processing. This paper introduces the system architecture of Godson-3 from various aspects including system scalability, organization of memory hierarchy, network-on-chip, inter-chip connection and I/O subsystem. 展开更多
关键词 multi-core processor scalable interconnection cache coherent non-uniform memory access/non-uniform cache access (CC-NUMA/NUCA) MESH CROSSBAR cache coherence reliability availability and serviceability (RAS)
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部