In shared-memory bus-based multiprocessors, when the number of processors grows, the processors spend an increasing amount of time waiting for access to the bus (and shared memory). This contention reduces the perform...In shared-memory bus-based multiprocessors, when the number of processors grows, the processors spend an increasing amount of time waiting for access to the bus (and shared memory). This contention reduces the performance of processors and imposes a limitation of the number of processors that can be used efficiently in bus-based systems. Since the multi-processor’s performance depends upon many parameters which affect the performance in different ways, timed Petri nets are used to model shared-memory bus-based multiprocessors at the instruction execution level, and the developed models are used to study how the performance of processors changes with the number of processors in the system. The results illustrate very well the restriction on the number of processors imposed by the shared bus. All performance characteristics presented in this paper are obtained by discrete-event simulation of Petri net models.展开更多
In this paper, we propose a parallel computing technique for content-based image retrieval (CBIR) system. This technique is mainly used for single node with multi-core processor, which is different from those based ...In this paper, we propose a parallel computing technique for content-based image retrieval (CBIR) system. This technique is mainly used for single node with multi-core processor, which is different from those based on cluster or network computing architecture. Due to its specific applications (such as medical image processing) and the harsh terms of hardware resource requirement, the CBIR system has been prevented from being widely used. With the increasing volume of the image database, the widespread use of multi-core processors, and the requirement of the retrieval accuracy and speed, we need to achieve a retrieval strategy which is based on multi-core processor to make the retrieval faster and more convenient than before. Experimental results demonstrate that this parallel architecture can significantly improve the performance of retrieval system. In addition, we also propose an efficient parallel technique with the combinations of the cluster and the multi-core techniques, which is supposed to gear to the new trend of the cloud computing.展开更多
Due to current technology enhancement,molecular databases have exponentially grown requesting faster efficient methods that can handle these amounts of huge data.There-fore,Multi-processing CPUs technology can be used...Due to current technology enhancement,molecular databases have exponentially grown requesting faster efficient methods that can handle these amounts of huge data.There-fore,Multi-processing CPUs technology can be used including physical and logical processors(Hyper Threading)to significantly increase the performance of computations.Accordingly,sequence comparison and pairwise alignment were both found contributing significantly in calculating the resemblance between sequences for constructing optimal alignments.This research used the Hash Table-NGram-Hirschberg(HT-NGH)algo-rithm to represent this pairwise alignment utilizing hashing capabilities.The authors propose using parallel shared memory architecture via Hyper Threading to improve the performance of molecular dataset protein pairwise alignment.The proposed parallel hyper threading method targeted the transformation of the HT-NGH on the datasets decomposition for sequence level efficient utilization within the processing units,that is,reducing idle processing unit situations.The authors combined hyper threading within the multicore architecture processing on shared memory utilization remarking perfor-mance of 24.8%average speed up to 34.4%as the highest boosting rate.The benefit of this work improvement is shown preserving acceptable accuracy,that is,reaching 2.08,2.88,and 3.87 boost-up as well as the efficiency of 1.04,0.96,and 0.97,using 2,3,and 4 cores,respectively,as attractive remarkable results.展开更多
Off-chip replacement (capacity and conflict) and coherent read misses in a distributed shared memory system cause execution to stall for hundreds of cycles. These off-chip replacement and coherent read misses are re...Off-chip replacement (capacity and conflict) and coherent read misses in a distributed shared memory system cause execution to stall for hundreds of cycles. These off-chip replacement and coherent read misses are recurring and forming sequences of two or more misses called streams. Prior streaming techniques ignored reordering of misses and not-recently-accessed streams while streaming data. In this paper, we present stream prefetcher design that can deal with both problems. Our stream prefetcher design utilizes stream waiting rooms to store not-recently-accessed streams. Stream waiting rooms help remove more off-chip misses. Using trace based simulation% our stream prefetcher design can remove 8% to 66% (on average 40%) and 17% to 63% (on average 39%) replacement and coherent read misses, respectively. Using cycle-accurate full-system simulation, our design gives speedups from 1.00 to 1.17 of princeton application repository for shared-memory computers (PARSEC) workloads running on a distributed shared memory system with the exception of dedup and swaptions workloads.展开更多
文摘In shared-memory bus-based multiprocessors, when the number of processors grows, the processors spend an increasing amount of time waiting for access to the bus (and shared memory). This contention reduces the performance of processors and imposes a limitation of the number of processors that can be used efficiently in bus-based systems. Since the multi-processor’s performance depends upon many parameters which affect the performance in different ways, timed Petri nets are used to model shared-memory bus-based multiprocessors at the instruction execution level, and the developed models are used to study how the performance of processors changes with the number of processors in the system. The results illustrate very well the restriction on the number of processors imposed by the shared bus. All performance characteristics presented in this paper are obtained by discrete-event simulation of Petri net models.
基金supported by the Natural Science Foundation of Shanghai (Grant No.08ZR1408200)the Shanghai Leading Academic Discipline Project (Grant No.J50103)the Open Project Program of the National Laboratory of Pattern Recognition
文摘In this paper, we propose a parallel computing technique for content-based image retrieval (CBIR) system. This technique is mainly used for single node with multi-core processor, which is different from those based on cluster or network computing architecture. Due to its specific applications (such as medical image processing) and the harsh terms of hardware resource requirement, the CBIR system has been prevented from being widely used. With the increasing volume of the image database, the widespread use of multi-core processors, and the requirement of the retrieval accuracy and speed, we need to achieve a retrieval strategy which is based on multi-core processor to make the retrieval faster and more convenient than before. Experimental results demonstrate that this parallel architecture can significantly improve the performance of retrieval system. In addition, we also propose an efficient parallel technique with the combinations of the cluster and the multi-core techniques, which is supposed to gear to the new trend of the cloud computing.
基金Deanship of Scientific Research(DSR),King Abdulaziz University,Grant/Award Number:D-139-137-1441。
文摘Due to current technology enhancement,molecular databases have exponentially grown requesting faster efficient methods that can handle these amounts of huge data.There-fore,Multi-processing CPUs technology can be used including physical and logical processors(Hyper Threading)to significantly increase the performance of computations.Accordingly,sequence comparison and pairwise alignment were both found contributing significantly in calculating the resemblance between sequences for constructing optimal alignments.This research used the Hash Table-NGram-Hirschberg(HT-NGH)algo-rithm to represent this pairwise alignment utilizing hashing capabilities.The authors propose using parallel shared memory architecture via Hyper Threading to improve the performance of molecular dataset protein pairwise alignment.The proposed parallel hyper threading method targeted the transformation of the HT-NGH on the datasets decomposition for sequence level efficient utilization within the processing units,that is,reducing idle processing unit situations.The authors combined hyper threading within the multicore architecture processing on shared memory utilization remarking perfor-mance of 24.8%average speed up to 34.4%as the highest boosting rate.The benefit of this work improvement is shown preserving acceptable accuracy,that is,reaching 2.08,2.88,and 3.87 boost-up as well as the efficiency of 1.04,0.96,and 0.97,using 2,3,and 4 cores,respectively,as attractive remarkable results.
基金supported by Higher Education Commission(Pakistan)National High Technology Research and Development Program of China(863 Program)(No.2008AA01A201)+1 种基金Natural Science Foundation of China(Nos.60833004 and 60970002)TNList Cross-discipline Foundation
文摘Off-chip replacement (capacity and conflict) and coherent read misses in a distributed shared memory system cause execution to stall for hundreds of cycles. These off-chip replacement and coherent read misses are recurring and forming sequences of two or more misses called streams. Prior streaming techniques ignored reordering of misses and not-recently-accessed streams while streaming data. In this paper, we present stream prefetcher design that can deal with both problems. Our stream prefetcher design utilizes stream waiting rooms to store not-recently-accessed streams. Stream waiting rooms help remove more off-chip misses. Using trace based simulation% our stream prefetcher design can remove 8% to 66% (on average 40%) and 17% to 63% (on average 39%) replacement and coherent read misses, respectively. Using cycle-accurate full-system simulation, our design gives speedups from 1.00 to 1.17 of princeton application repository for shared-memory computers (PARSEC) workloads running on a distributed shared memory system with the exception of dedup and swaptions workloads.