摘要
The performance of scalable shared-memory multiprocessors suffers from three types of latency; memory latency, the latency caused by inter-process synchronization ,and the latency caused by instructions that take multiple cycles to produce results To tolerate these three types of latencies, The following techniques was proposed to couple: coarse-grained multithreading, the superscalar processor and a reconfigurable device, namely the overlapping long latency operations of one thread of computation with the execution of other threads The superscalar processor principle is used to tolerate instruction latency by issuing several instructions simultaneously The DPGA is coupled with this processor in order to improve the context-switching
The performance of scalable shared-memory multiprocessors suffers from three types of latency; memory latency, the latency caused by inter-process synchronization ,and the latency caused by instructions that take multiple cycles to produce results To tolerate these three types of latencies, The following techniques was proposed to couple: coarse-grained multithreading, the superscalar processor and a reconfigurable device, namely the overlapping long latency operations of one thread of computation with the execution of other threads The superscalar processor principle is used to tolerate instruction latency by issuing several instructions simultaneously The DPGA is coupled with this processor in order to improve the context-switching overhead