Previous descriptions of memory consistency models in shared-memory multiprocessor systems are mainly expressed as constraints on the memory access event ordering and hence are hardwae-centric. This paper presents a ...Previous descriptions of memory consistency models in shared-memory multiprocessor systems are mainly expressed as constraints on the memory access event ordering and hence are hardwae-centric. This paper presents a framework of memory consistency models which describes the memory consistency model on the behavior level.Based on the understanding that the behavior of an execution is determined by the execution order of confiicting accesses, a memory consistency model is defined as an interprocessor synchronization mechanism which orders the execution of operations from different processors. Synchronization order of an execution under certain consistency model is also defined. The synchronization order, together with the program order,determines the behavior of an execution.This paper also presents criteria for correct program and correct implementation of consistency models. Regarding an implementation of a consistency model as certain memory event ordering constraints, this paper provides a method to prove the correctness of consistency model implementations, and the correctness of the lock-based cache coherence protocol is proved with this method.展开更多
Multithreaded technique is the developing trend of high performance processor. Memory consistency model is essential to the correctness, performance and complexity of multithreaded processor. The chip multithreaded co...Multithreaded technique is the developing trend of high performance processor. Memory consistency model is essential to the correctness, performance and complexity of multithreaded processor. The chip multithreaded consistency model adapting to multithreaded processor is proposed in this paper. The restriction imposed on memory event ordering by chip multithreaded consistency is presented and formalized. With the idea of critical cycle built by Wei-Wu Hu, we prove that the proposed chip multithreaded consistency model satisfies the criterion of correct execution of sequential consistency model. Chip multithreaded consistency model provides a way of achieving high performance compared with sequential consistency model and easures the compatibility of software that the execution result in multithreaded processor is the same as the execution result in uniprocessor. The implementation strategy of chip multithreaded consistency model in Godson-2 SMT processor is also proposed. Godson-2 SMT processor supports chip multithreaded consistency model correctly by exception scheme based on the sequential memory access queue of each thread.展开更多
Directory protocols are widely adopted to maintain cache coherence of distributed shared memory multiprocessors. Although scalable to a certain extent, directory protocols are complex enough to prevent it from being u...Directory protocols are widely adopted to maintain cache coherence of distributed shared memory multiprocessors. Although scalable to a certain extent, directory protocols are complex enough to prevent it from being used in very large scale multiprocessors with tens of thousands of nodes. This paper proposes a lock-based cache coherence protocol for scope conyistency. It does not rely on directory information to maintain cache coherence. Instead, cache coherence is mailltained through requiring the releasing processor of a lock to store all write-notices generated in the associated critical section to the lock and the acquiring processor invalidates or updates its locally cached data copies according to the write notices of the lock. To evaluate the performance of the lock-based cache coherence protocol, a software DSM system named JIAJIA is built on network of workstations. Besides the lockbased cache coherence protocol, JIAJIA also characterizes itself with its shared memory organization scheme which combines the physical memories of multiple workstations to form a large shared space. Performance measurements with SPLASH2 program suite and NAS benchmarks indicate that, compared to recent SVM systems such as CVM, higher speedup is achieved by JIAJIA.Besides, JIAJIA can solve large scale problems that cannot be solved by other SVM systems due to memory size limitation.展开更多
False sharing is one of the most important factors impacting the performance of DSM (distributed shared memory) systems. The single-writer approach is simple, but it cannot avoid the ping-pong effect of the data page...False sharing is one of the most important factors impacting the performance of DSM (distributed shared memory) systems. The single-writer approach is simple, but it cannot avoid the ping-pong effect of the data page thrashing, while the multiple-writer approach is effective for false sharing but with high cost. This paper proposes a new approach, called limited multiple-writer (LMW) to handling multiple writers in software DSM. It distinguishes two kinds of multiple-writer as lock-based form and barrier-based form, and handles them with different policies. It discards the Twin and Diffin traditional multiple-writer approach, and simplifies the implementation of multiple-writer in software DSM systems. The implementation of LMW in a CVM (Coherent Virtual Machine) software DSM system, which is based on a network of workstations, is introduced. Evaluation results show that for some applications such as SOR (Successive Over-Relaxation), LU (Lower triangular and Upper triangular), FFT (Fast Fourier Transformation), and IS (Integer Sorting), LMW provides a significant reduction in'execution time (11%, 16%, 33% and 46%) compared with the traditional multiple-writer approach on the platform.展开更多
文摘Previous descriptions of memory consistency models in shared-memory multiprocessor systems are mainly expressed as constraints on the memory access event ordering and hence are hardwae-centric. This paper presents a framework of memory consistency models which describes the memory consistency model on the behavior level.Based on the understanding that the behavior of an execution is determined by the execution order of confiicting accesses, a memory consistency model is defined as an interprocessor synchronization mechanism which orders the execution of operations from different processors. Synchronization order of an execution under certain consistency model is also defined. The synchronization order, together with the program order,determines the behavior of an execution.This paper also presents criteria for correct program and correct implementation of consistency models. Regarding an implementation of a consistency model as certain memory event ordering constraints, this paper provides a method to prove the correctness of consistency model implementations, and the correctness of the lock-based cache coherence protocol is proved with this method.
基金Supported by the National High Technology Development 863 Program of China(Grant Nos.2007AA01Z114, 2006AA010201)the National Natural Science Foundation of China(Grant Nos.60703017, 60736012, 60325205, 60673146, 60603049)+1 种基金the National Grand Fundamental Research 973 Program of China(Grant Nos.2005CB321601, 2005CB321603)Beijing Natural Science Foundation(Grant No.4072024).
文摘Multithreaded technique is the developing trend of high performance processor. Memory consistency model is essential to the correctness, performance and complexity of multithreaded processor. The chip multithreaded consistency model adapting to multithreaded processor is proposed in this paper. The restriction imposed on memory event ordering by chip multithreaded consistency is presented and formalized. With the idea of critical cycle built by Wei-Wu Hu, we prove that the proposed chip multithreaded consistency model satisfies the criterion of correct execution of sequential consistency model. Chip multithreaded consistency model provides a way of achieving high performance compared with sequential consistency model and easures the compatibility of software that the execution result in multithreaded processor is the same as the execution result in uniprocessor. The implementation strategy of chip multithreaded consistency model in Godson-2 SMT processor is also proposed. Godson-2 SMT processor supports chip multithreaded consistency model correctly by exception scheme based on the sequential memory access queue of each thread.
文摘Directory protocols are widely adopted to maintain cache coherence of distributed shared memory multiprocessors. Although scalable to a certain extent, directory protocols are complex enough to prevent it from being used in very large scale multiprocessors with tens of thousands of nodes. This paper proposes a lock-based cache coherence protocol for scope conyistency. It does not rely on directory information to maintain cache coherence. Instead, cache coherence is mailltained through requiring the releasing processor of a lock to store all write-notices generated in the associated critical section to the lock and the acquiring processor invalidates or updates its locally cached data copies according to the write notices of the lock. To evaluate the performance of the lock-based cache coherence protocol, a software DSM system named JIAJIA is built on network of workstations. Besides the lockbased cache coherence protocol, JIAJIA also characterizes itself with its shared memory organization scheme which combines the physical memories of multiple workstations to form a large shared space. Performance measurements with SPLASH2 program suite and NAS benchmarks indicate that, compared to recent SVM systems such as CVM, higher speedup is achieved by JIAJIA.Besides, JIAJIA can solve large scale problems that cannot be solved by other SVM systems due to memory size limitation.
基金This work is supported in part by the National Natural Science Foundation of China under grant No.69896250 and in part by the N
文摘False sharing is one of the most important factors impacting the performance of DSM (distributed shared memory) systems. The single-writer approach is simple, but it cannot avoid the ping-pong effect of the data page thrashing, while the multiple-writer approach is effective for false sharing but with high cost. This paper proposes a new approach, called limited multiple-writer (LMW) to handling multiple writers in software DSM. It distinguishes two kinds of multiple-writer as lock-based form and barrier-based form, and handles them with different policies. It discards the Twin and Diffin traditional multiple-writer approach, and simplifies the implementation of multiple-writer in software DSM systems. The implementation of LMW in a CVM (Coherent Virtual Machine) software DSM system, which is based on a network of workstations, is introduced. Evaluation results show that for some applications such as SOR (Successive Over-Relaxation), LU (Lower triangular and Upper triangular), FFT (Fast Fourier Transformation), and IS (Integer Sorting), LMW provides a significant reduction in'execution time (11%, 16%, 33% and 46%) compared with the traditional multiple-writer approach on the platform.