The design of parallel algorithms is studied in this paper. These algorithms are applicable to shared memory MIMD machines In this paper, the emphasis is put on the methods for design of the efficient parallel algori...The design of parallel algorithms is studied in this paper. These algorithms are applicable to shared memory MIMD machines In this paper, the emphasis is put on the methods for design of the efficient parallel algorithms. The design of efficient parallel algorithms should be based on the following considerationst algorithm parallelism and the hardware-parallelism; granularity of the parallel algorithm, algorithm optimization according to the underling parallel machine. In this paper , these principles are applied to solve a model problem of the PDE. The speedup of the new method is high. The results were tested and evaluated on a shared memory MIMD machine. The practical results were agree with the predicted performance.展开更多
A Single-Buffered (SB) router is a router where only one stage of shared buffering is sandwiched between two interconnects in comparison of a Combined Input and Output Queued (CIOQ) router where a central switch f...A Single-Buffered (SB) router is a router where only one stage of shared buffering is sandwiched between two interconnects in comparison of a Combined Input and Output Queued (CIOQ) router where a central switch fabric is sandwiched between two stages of buffering. The notion of SB routers was firstly proposed by the High-Performance Networking Group (HPNG) of Stanford University, along with two promising designs of SB routers: one of which was Parallel Shared Memory (PSM) router and the other was Distributed Shared Memory (DSM) router. Admittedly, the work of HPNG deserved full credit, but all results presented by them appeared to relay on a Centralized Memory Management Algorithm (CMMA) which was essentially impractical because of the high processing and communication complexity. This paper attempts to make a scalable high-speed SB router completely practical by introducing a fully distributed architecture for managing the shared memory of SB routers. The resulting SB router is called as a Virtual Output and Input Queued (VOIQ) router. Furthermore, the scheme of VOIQ routers can not only eliminate the need for the CMMA scheduler, thus allowing a fully distributed implementation with low processing and commu- nication complexity, but also provide QoS guarantees and efficiently support variable-length packets in this paper. In particular, the results of performance testing and the hardware implementation of our VOIQ-based router (NDSC~ SR1880-TTM series) are illustrated at the end of this paper. The proposal of this paper is the first distributed scheme of how to design and implement SB routers publicized till now.展开更多
This paper studies a high-speed text-independent Automatic Speaker Recognition(ASR)algorithm based on a multicore system's Gaussian Mixture Model(GMM).The high speech is achieved using parallel implementation of t...This paper studies a high-speed text-independent Automatic Speaker Recognition(ASR)algorithm based on a multicore system's Gaussian Mixture Model(GMM).The high speech is achieved using parallel implementation of the feature's extraction and aggregation methods during training and testing procedures.Shared memory parallel programming techniques using both OpenMP and PThreads libraries are developed to accelerate the code and improve the performance of the ASR algorithm.The experimental results show speed-up improvements of around 3.2 on a personal laptop with Intel i5-6300HQ(2.3 GHz,four cores without hyper-threading,and 8 GB of RAM).In addition,a remarkable 100%speaker recognition accuracy is achieved.展开更多
文摘The design of parallel algorithms is studied in this paper. These algorithms are applicable to shared memory MIMD machines In this paper, the emphasis is put on the methods for design of the efficient parallel algorithms. The design of efficient parallel algorithms should be based on the following considerationst algorithm parallelism and the hardware-parallelism; granularity of the parallel algorithm, algorithm optimization according to the underling parallel machine. In this paper , these principles are applied to solve a model problem of the PDE. The speedup of the new method is high. The results were tested and evaluated on a shared memory MIMD machine. The practical results were agree with the predicted performance.
基金the National High-Tech Research and De-velopment Program of China (863 Program) (2003AA103510, 2004AA103130, 2005AA121210).
文摘A Single-Buffered (SB) router is a router where only one stage of shared buffering is sandwiched between two interconnects in comparison of a Combined Input and Output Queued (CIOQ) router where a central switch fabric is sandwiched between two stages of buffering. The notion of SB routers was firstly proposed by the High-Performance Networking Group (HPNG) of Stanford University, along with two promising designs of SB routers: one of which was Parallel Shared Memory (PSM) router and the other was Distributed Shared Memory (DSM) router. Admittedly, the work of HPNG deserved full credit, but all results presented by them appeared to relay on a Centralized Memory Management Algorithm (CMMA) which was essentially impractical because of the high processing and communication complexity. This paper attempts to make a scalable high-speed SB router completely practical by introducing a fully distributed architecture for managing the shared memory of SB routers. The resulting SB router is called as a Virtual Output and Input Queued (VOIQ) router. Furthermore, the scheme of VOIQ routers can not only eliminate the need for the CMMA scheduler, thus allowing a fully distributed implementation with low processing and commu- nication complexity, but also provide QoS guarantees and efficiently support variable-length packets in this paper. In particular, the results of performance testing and the hardware implementation of our VOIQ-based router (NDSC~ SR1880-TTM series) are illustrated at the end of this paper. The proposal of this paper is the first distributed scheme of how to design and implement SB routers publicized till now.
文摘This paper studies a high-speed text-independent Automatic Speaker Recognition(ASR)algorithm based on a multicore system's Gaussian Mixture Model(GMM).The high speech is achieved using parallel implementation of the feature's extraction and aggregation methods during training and testing procedures.Shared memory parallel programming techniques using both OpenMP and PThreads libraries are developed to accelerate the code and improve the performance of the ASR algorithm.The experimental results show speed-up improvements of around 3.2 on a personal laptop with Intel i5-6300HQ(2.3 GHz,four cores without hyper-threading,and 8 GB of RAM).In addition,a remarkable 100%speaker recognition accuracy is achieved.