Ray tracing is a computer graphics method that renders images realistically. As the name suggests, this technique primarily traces the path of light rays interacting with objects in a scene [1], permitting the calcula...Ray tracing is a computer graphics method that renders images realistically. As the name suggests, this technique primarily traces the path of light rays interacting with objects in a scene [1], permitting the calculation of lighting and reflecting impact [2]. As ray tracing is a time-consuming process, the need for parallelization to solve this problem arises. One downside of this solution is the existence of race conditions. In this work, we explore and experiment with a different, well-known solution for this race condition. Starting with the introduction and the background section, a brief overview of the topic is followed by a detailed part of how the race conditions may occur in the case of the ray tracing algorithm. Continuing with the methods and results section, we have used OpenMP to parallelize the Ray tracing algorithm with the different compiler directives critical, atomic, and first-private. Hence, it concluded that both critical and atomic are not efficient solutions to produce a good-quality picture, but first-private succeeded in producing a high-quality picture.展开更多
In this paper, we propose two kinds of modifications in speaker recognition. First, the correlations between frequency channels are of prime importance for speaker recognition. Some of these correlations are lost when...In this paper, we propose two kinds of modifications in speaker recognition. First, the correlations between frequency channels are of prime importance for speaker recognition. Some of these correlations are lost when the frequency domain is divided into sub-bands. Consequently we propose a particularly redundant parallel architecture for which most of the correlations are kept. Second, generally a log transformation used to modify the power spectrum is done after the filter-bank in the classical spectrum calculation. We will see that performing this transformation before the filter bank is more interesting in our case. In the processing of recognition, the Gaussian mixture model (GMM) recognition arithmetic is adopted. Experiments on speech corrupted by noise show a better adaptability of this approach in noisy environments, comoared with a conventional device, esoeciallv when oruning of some recognizers is performed.展开更多
This paper describes the design and implementation of HITASE, a general-purpose architecture Simulation Environment for evaluating performance of parallel machine architecture. This simulation environment is designed ...This paper describes the design and implementation of HITASE, a general-purpose architecture Simulation Environment for evaluating performance of parallel machine architecture. This simulation environment is designed for SPMD parallel machines and implemented by using program-driven methodology. Compared to other similar simulation systems, our simulation environment has strong generalities, and can study new architecture, such as a reconfigurable multistage interconnection network.展开更多
In this paper, we present a general survey on parallel computing. The main contents include parallel computer system which is the hardware platform of parallel computing, parallel algorithm which is the theoretical ba...In this paper, we present a general survey on parallel computing. The main contents include parallel computer system which is the hardware platform of parallel computing, parallel algorithm which is the theoretical base of parallel computing, parallel programming which is the software support of parallel computing. After that, we also introduce some parallel applications and enabling technologies. We argue that parallel computing research should form an integrated methodology of "architecture algorithm programming application". Only in this way, parallel computing research becomes continuous development and more realistic.展开更多
Today, parallel programming is dominated by message passing libraries, such as message passing interface (MPI). This article intends to simplify parallel programming by generating parallel programs from parallelized...Today, parallel programming is dominated by message passing libraries, such as message passing interface (MPI). This article intends to simplify parallel programming by generating parallel programs from parallelized algorithm design strategies. It uses skeletons to abstract parallelized algorithm design strategies, as well as parallel architectures. Starting from problem specification, an abstract parallel abstract programming language+ (Apla+) program is generated from parallelized algorithm design strategies and problem-specific function definitions. By combining with parallel architectures, implicity of parallelism inside the parallelized algorithm design strategies is exploited. With implementation and transformation, C++ and parallel virtual machine (CPPVM) parallel program is finally generated. Parallelized branch and bound (B&B) algorithm design strategy and paraUelized divide and conquer (D & C) algorithm design strategy are studied in this article as examples. And it also illustrates the approach with a case study.展开更多
Evolutionary neural network(ENN)shows high performance in function optimization and in finding approximately global optima from searching large and complex spaces.It is one of the most efficient and adaptive optimizat...Evolutionary neural network(ENN)shows high performance in function optimization and in finding approximately global optima from searching large and complex spaces.It is one of the most efficient and adaptive optimization techniques used widely to provide candidate solutions that lead to the fitness of the problem.ENN has the extraordinary ability to search the global and learning the approximate optimal solution regardless of the gradient information of the error functions.However,ENN requires high computation and processing which requires parallel processing platforms such as field programmable gate arrays(FPGAs)and graphic processing units(GPUs)to achieve a good performance.This work involves different new implementations of ENN by exploring and adopting different techniques and opportunities for parallel processing.Different versions of ENN algorithm have also been implemented and parallelized on FPGAs platform for low latency by exploiting the parallelism and pipelining approaches.Real data form mass spectrometry data(MSD)application was tested to examine and verify our implementations.This is a very important and extensive computation application which needs to search and find the optimal features(peaks)in MSD in order to distinguish cancer patients from control patients.ENN algorithm is also implemented and parallelized on single core and GPU platforms for comparison purposes.The computation time of our optimized algorithm on FPGA and GPU has been improved by a factor of 6.75 and 6,respectively.展开更多
With the emergence of high-bitrate applications, cross stratum optimization (CSO) attracts the interest of network operators because of its application in the joint optimization of optical networks and application s...With the emergence of high-bitrate applications, cross stratum optimization (CSO) attracts the interest of network operators because of its application in the joint optimization of optical networks and application stratum resources. Given the large-scale growth and high complexity of optical networks, achieving a more effective, accurate, and practical CSO becomes an important research focus. In this letter, we present a CSO-oriented, unified control architecture for OpenFlow-enabled triple-M optical networks. A novel dynamic global load balancing (DGLB) strategy with dynamic resource rating for CSO is presented based on the proposed architecture. The DGLB strategy is then compared with four other strategies by conducting experiments on a SOFT-based testbed with 1000 virtual nodes.展开更多
文摘Ray tracing is a computer graphics method that renders images realistically. As the name suggests, this technique primarily traces the path of light rays interacting with objects in a scene [1], permitting the calculation of lighting and reflecting impact [2]. As ray tracing is a time-consuming process, the need for parallelization to solve this problem arises. One downside of this solution is the existence of race conditions. In this work, we explore and experiment with a different, well-known solution for this race condition. Starting with the introduction and the background section, a brief overview of the topic is followed by a detailed part of how the race conditions may occur in the case of the ray tracing algorithm. Continuing with the methods and results section, we have used OpenMP to parallelize the Ray tracing algorithm with the different compiler directives critical, atomic, and first-private. Hence, it concluded that both critical and atomic are not efficient solutions to produce a good-quality picture, but first-private succeeded in producing a high-quality picture.
基金the National Natural Science Foundation of China (No. 60171043, 60371046)
文摘In this paper, we propose two kinds of modifications in speaker recognition. First, the correlations between frequency channels are of prime importance for speaker recognition. Some of these correlations are lost when the frequency domain is divided into sub-bands. Consequently we propose a particularly redundant parallel architecture for which most of the correlations are kept. Second, generally a log transformation used to modify the power spectrum is done after the filter-bank in the classical spectrum calculation. We will see that performing this transformation before the filter bank is more interesting in our case. In the processing of recognition, the Gaussian mixture model (GMM) recognition arithmetic is adopted. Experiments on speech corrupted by noise show a better adaptability of this approach in noisy environments, comoared with a conventional device, esoeciallv when oruning of some recognizers is performed.
文摘This paper describes the design and implementation of HITASE, a general-purpose architecture Simulation Environment for evaluating performance of parallel machine architecture. This simulation environment is designed for SPMD parallel machines and implemented by using program-driven methodology. Compared to other similar simulation systems, our simulation environment has strong generalities, and can study new architecture, such as a reconfigurable multistage interconnection network.
基金Supported by the National Natural Science Foundation of China under Grant No.60533020. Acknowledgments We would like to thank the anonymous referees for their valuable suggestions and comments to improve the presentation of this paper.
文摘In this paper, we present a general survey on parallel computing. The main contents include parallel computer system which is the hardware platform of parallel computing, parallel algorithm which is the theoretical base of parallel computing, parallel programming which is the software support of parallel computing. After that, we also introduce some parallel applications and enabling technologies. We argue that parallel computing research should form an integrated methodology of "architecture algorithm programming application". Only in this way, parallel computing research becomes continuous development and more realistic.
基金National Natural Science Foundation of China (60773054)National Basic Research Program of China (2003CCA02800)
文摘Today, parallel programming is dominated by message passing libraries, such as message passing interface (MPI). This article intends to simplify parallel programming by generating parallel programs from parallelized algorithm design strategies. It uses skeletons to abstract parallelized algorithm design strategies, as well as parallel architectures. Starting from problem specification, an abstract parallel abstract programming language+ (Apla+) program is generated from parallelized algorithm design strategies and problem-specific function definitions. By combining with parallel architectures, implicity of parallelism inside the parallelized algorithm design strategies is exploited. With implementation and transformation, C++ and parallel virtual machine (CPPVM) parallel program is finally generated. Parallelized branch and bound (B&B) algorithm design strategy and paraUelized divide and conquer (D & C) algorithm design strategy are studied in this article as examples. And it also illustrates the approach with a case study.
文摘Evolutionary neural network(ENN)shows high performance in function optimization and in finding approximately global optima from searching large and complex spaces.It is one of the most efficient and adaptive optimization techniques used widely to provide candidate solutions that lead to the fitness of the problem.ENN has the extraordinary ability to search the global and learning the approximate optimal solution regardless of the gradient information of the error functions.However,ENN requires high computation and processing which requires parallel processing platforms such as field programmable gate arrays(FPGAs)and graphic processing units(GPUs)to achieve a good performance.This work involves different new implementations of ENN by exploring and adopting different techniques and opportunities for parallel processing.Different versions of ENN algorithm have also been implemented and parallelized on FPGAs platform for low latency by exploiting the parallelism and pipelining approaches.Real data form mass spectrometry data(MSD)application was tested to examine and verify our implementations.This is a very important and extensive computation application which needs to search and find the optimal features(peaks)in MSD in order to distinguish cancer patients from control patients.ENN algorithm is also implemented and parallelized on single core and GPU platforms for comparison purposes.The computation time of our optimized algorithm on FPGA and GPU has been improved by a factor of 6.75 and 6,respectively.
基金supported by the National"863"Program of China(No.2012AA011301)the National"973"Program of China(No.2010CB328204)+3 种基金the National Natural Science Foundation of China(Nos.61271189,61201154,and 60932004)the Research Fund for the Doctoral Program of Higher Education of China(No.20120005120019)the Fundamental Research Funds for the Central Universities(No.2013RC1201)the Fund of State Key Laborotory of Information Photonics and Optical Communications(BUPT)
文摘With the emergence of high-bitrate applications, cross stratum optimization (CSO) attracts the interest of network operators because of its application in the joint optimization of optical networks and application stratum resources. Given the large-scale growth and high complexity of optical networks, achieving a more effective, accurate, and practical CSO becomes an important research focus. In this letter, we present a CSO-oriented, unified control architecture for OpenFlow-enabled triple-M optical networks. A novel dynamic global load balancing (DGLB) strategy with dynamic resource rating for CSO is presented based on the proposed architecture. The DGLB strategy is then compared with four other strategies by conducting experiments on a SOFT-based testbed with 1000 virtual nodes.