Along with the increasing Big Data challenges, the MapReduce based systems are extensively welcomed, because of their remarkable simplicity and scalability. However, from the first day MapReduce is proposed, its a...Along with the increasing Big Data challenges, the MapReduce based systems are extensively welcomed, because of their remarkable simplicity and scalability. However, from the first day MapReduce is proposed, its argument with parallel Dt3MSs never stops, as it over-focuses on the scalability but overlooks the efficiency. Accordingly, extended systems are proposed in order to improve the peDbrmance on the limited scale clusters. In the meantime, traditional RDBMS technologies like structured data model, transaction, SQL, etc. are also getting more attention. This paper reviews such systems, from Google and also the third parties, trying to indicate the directions for the future research.展开更多
Large range cell migration is a severe challenge to imaging algorithm for spaceborne SAR. Based on design of Finite Impulse Response (FIR) filter and Range Doppler (RD) algorithm, a realization of quick-look imaging f...Large range cell migration is a severe challenge to imaging algorithm for spaceborne SAR. Based on design of Finite Impulse Response (FIR) filter and Range Doppler (RD) algorithm, a realization of quick-look imaging for large range cell migration is proposed. It realized quick-look imaging of 8 times reduced resolution with parallel processing on memory shared 8 CPU SGI server. According to simulation experiment, this quick-look imaging algorithm with parallel processing can image 16384x16384 SAR raw data within 6 seconds. It reaches the requirement of real-time imaging.展开更多
This paper deals with a parallel processing uninterruptible power supply (UPS) for sudden voltage fluctuation in power management to integrate power quality improvement, load voltage stabilization and UPS. To reduce t...This paper deals with a parallel processing uninterruptible power supply (UPS) for sudden voltage fluctuation in power management to integrate power quality improvement, load voltage stabilization and UPS. To reduce the complexity, cost and number of power conversions, which results in higher efficiency, only one voltage-controlled voltage source inverter (VCVSI) is used. The VCVSI is connected in series on the DC battery side and in parallel on the AC grid side with a decoupling inductor. The system provides sinusoidal voltage at the fundamental value of 220V/60Hz for the load during abnormal utility power conditions or grid failure. Also, the system can be operated to mitigate the harmonic current and voltage demand from nonlinear loads and provide voltage stabilization for loads when sudden voltage fluctuation occur, such as sag and swell. The experimental results confirm the system protects against outages caused by abnormal utility power conditions and sudden voltage fluctuations and change.展开更多
ADSP-TS101 is a high performance DSP with good properties of parallel processing and high speed.According to the real-time processing requirements of underwater acoustic communication algorithms,a real-time parallel p...ADSP-TS101 is a high performance DSP with good properties of parallel processing and high speed.According to the real-time processing requirements of underwater acoustic communication algorithms,a real-time parallel processing system with multi-channel synchronous sample,which is composed of multiple ADSP-TS101s,is designed and carried out.For the hardware design,field programmable gate array(FPGA)logical control is adopted for the design of multi-channel synchronous sample module and cluster/data flow associated pin connection mode is adopted for multiprocessing parallel processing configuration respectively.And the software is optimized by two kinds of communication ways:broadcast writing way through shared bus and point-to-point way through link ports.Through the whole system installation,connective debugging,and experiments in a lake,the results show that the real-time parallel processing system has good stability and real-time processing capability and meets the technical design requirements of real-time processing.展开更多
The k-Nearest Neighbor method is one of the most popular techniques for both classification and regression purposes.Because of its operation,the application of this classification may be limited to problems with a cer...The k-Nearest Neighbor method is one of the most popular techniques for both classification and regression purposes.Because of its operation,the application of this classification may be limited to problems with a certain number of instances,particularly,when run time is a consideration.However,the classification of large amounts of data has become a fundamental task in many real-world applications.It is logical to scale the k-Nearest Neighbor method to large scale datasets.This paper proposes a new k-Nearest Neighbor classification method(KNN-CCL)which uses a parallel centroid-based and hierarchical clustering algorithm to separate the sample of training dataset into multiple parts.The introduced clustering algorithm uses four stages of successive refinements and generates high quality clusters.The k-Nearest Neighbor approach subsequently makes use of them to predict the test datasets.Finally,sets of experiments are conducted on the UCI datasets.The experimental results confirm that the proposed k-Nearest Neighbor classification method performs well with regard to classification accuracy and performance.展开更多
In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularl...In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularly noteworthy in the field of image processing, which witnessed significant advancements. This parallel computing project explored the field of parallel image processing, with a focus on the grayscale conversion of colorful images. Our approach involved integrating OpenMP into our framework for parallelization to execute a critical image processing task: grayscale conversion. By using OpenMP, we strategically enhanced the overall performance of the conversion process by distributing the workload across multiple threads. The primary objectives of our project revolved around optimizing computation time and improving overall efficiency, particularly in the task of grayscale conversion of colorful images. Utilizing OpenMP for concurrent processing across multiple cores significantly reduced execution times through the effective distribution of tasks among these cores. The speedup values for various image sizes highlighted the efficacy of parallel processing, especially for large images. However, a detailed examination revealed a potential decline in parallelization efficiency with an increasing number of cores. This underscored the importance of a carefully optimized parallelization strategy, considering factors like load balancing and minimizing communication overhead. Despite challenges, the overall scalability and efficiency achieved with parallel image processing underscored OpenMP’s effectiveness in accelerating image manipulation tasks.展开更多
Real-time capabilities and computational efficiency are provided by parallel image processing utilizing OpenMP. However, race conditions can affect the accuracy and reliability of the outcomes. This paper highlights t...Real-time capabilities and computational efficiency are provided by parallel image processing utilizing OpenMP. However, race conditions can affect the accuracy and reliability of the outcomes. This paper highlights the importance of addressing race conditions in parallel image processing, specifically focusing on color inverse filtering using OpenMP. We considered three solutions to solve race conditions, each with distinct characteristics: #pragma omp atomic: Protects individual memory operations for fine-grained control. #pragma omp critical: Protects entire code blocks for exclusive access. #pragma omp parallel sections reduction: Employs a reduction clause for safe aggregation of values across threads. Our findings show that the produced images were unaffected by race condition. However, it becomes evident that solving the race conditions in the code makes it significantly faster, especially when it is executed on multiple cores.展开更多
Using the method of mathematical morphology,this paper fulfills filtration,segmentation and extraction of morphological features of the satellite cloud image.It also gives out the relative algorithms,which is realized...Using the method of mathematical morphology,this paper fulfills filtration,segmentation and extraction of morphological features of the satellite cloud image.It also gives out the relative algorithms,which is realized by parallel C programming based on Transputer networks.It has been successfully used to process the typhoon and the low tornado cloud image.And it will be used in weather forecast.展开更多
The flexibility of traditional image processing system is limited because those system are designed for specific applications. In this paper, a new TMS320C64x-based multi-DSP parallel computing architecture is present...The flexibility of traditional image processing system is limited because those system are designed for specific applications. In this paper, a new TMS320C64x-based multi-DSP parallel computing architecture is presented. It has many promising characteristics such as powerful computing capability, broad I/O bandwidth, topology flexibility, and expansibility. The parallel system performance is evaluated by practical experiment.展开更多
Withthe rapiddevelopment of deep learning,the size of data sets anddeepneuralnetworks(DNNs)models are also booming.As a result,the intolerable long time for models’training or inference with conventional strategies c...Withthe rapiddevelopment of deep learning,the size of data sets anddeepneuralnetworks(DNNs)models are also booming.As a result,the intolerable long time for models’training or inference with conventional strategies can not meet the satisfaction of modern tasks gradually.Moreover,devices stay idle in the scenario of edge computing(EC),which presents a waste of resources since they can share the pressure of the busy devices but they do not.To address the problem,the strategy leveraging distributed processing has been applied to load computation tasks from a single processor to a group of devices,which results in the acceleration of training or inference of DNN models and promotes the high utilization of devices in edge computing.Compared with existing papers,this paper presents an enlightening and novel review of applying distributed processing with data and model parallelism to improve deep learning tasks in edge computing.Considering the practicalities,commonly used lightweight models in a distributed system are introduced as well.As the key technique,the parallel strategy will be described in detail.Then some typical applications of distributed processing will be analyzed.Finally,the challenges of distributed processing with edge computing will be described.展开更多
The parallel processing based on the free running model test was adopted to predict the interaction force coefficients (flow straightening coefficient and wake fraction) of ship maneuvering. And the multipopulation ...The parallel processing based on the free running model test was adopted to predict the interaction force coefficients (flow straightening coefficient and wake fraction) of ship maneuvering. And the multipopulation genetic algorithm (MPGA) based on real coding that can contemporarily process the data of free running model and simulation of ship maneuvering was applied to solve the problem. Accordingly the optimal individual was obtained using the method of genetic algorithm. The parallel processing of multiopulation solved the prematurity in the identification for single population, meanwhile, the parallel processing of the data of ship maneuvering (turning motion and zigzag motion) is an attempt to solve the coefficient drift problem. In order to validate the method, the interaction force coefficients were verified by the procedure and these coefficients measured were compared with those ones identified. The maximum error is less than 5%, and the identification is an effective method.展开更多
To improve image processing speed and detection precision of a surface detection system on a strip surface,based on the analysis of the characteristics of image data and image processing in detection system on the str...To improve image processing speed and detection precision of a surface detection system on a strip surface,based on the analysis of the characteristics of image data and image processing in detection system on the strip surface,the design of parallel image processing system and the methods of algorithm implementation have been studied. By using field programmable gate array(FPGA) as hardware platform of implementation and considering the characteristic of detection system on the strip surface,a parallel image processing system implemented by using multi IP kernel is designed. According to different computing tasks and the load balancing capability of parallel processing system,the system could set different calculating numbers of nodes to meet the system's demand and save the hardware cost.展开更多
An improvement detecting method was proposed according to the disadvantages of testing method of optical axes parallelism of shipboard photoelectrical theodolite (short for theodolite) based on image processing. Point...An improvement detecting method was proposed according to the disadvantages of testing method of optical axes parallelism of shipboard photoelectrical theodolite (short for theodolite) based on image processing. Pointolite replaced 0.2'' collimator to reduce the errors of crosshair images processing and improve the quality of image. What’s more, the high quality images could help to optimize the image processing method and the testing accuracy. The errors between the trial results interpreted by software and the results tested in dock were less than 10'', which indicated the improve method had some actual application values.展开更多
On-line transient stability analysis of a power grid is crucial in determining whether the power grid will traverse to a steady state stable operating point after a disturbance. The transient stability analysis involv...On-line transient stability analysis of a power grid is crucial in determining whether the power grid will traverse to a steady state stable operating point after a disturbance. The transient stability analysis involves computing the solutions of the algebraic equations modeling the grid network and the ordinary differential equations modeling the dynamics of the electrical components like synchronous generators, exciters, governors, etc., of the grid in near real-time. In this research, we investigate the use of time-parallel approach in particular the Parareal algorithm implementation on Graphical Processing Unit using Compute Unified Device Architecture to compute solutions of ordinary differential equations. The numerical solution accuracy and computation time of the Parareal algorithm executing on the GPU are demonstrated on the single machine infinite bus test system. Two types of dynamic model of the single synchronous generator namely the classical and detailed models are studied. The numerical solutions of the ordinary differential equations computed by the Parareal algorithm are compared to that computed using the modified Euler’s method demonstrating the accuracy of the Parareal algorithm executing on GPU. Simulations are performed with varying numerical integration time steps, and the suitability of Parareal algorithm in computing near real-time solutions of ordinary different equations is presented. A speedup of 25× and 31× is achieved with the Parareal algorithm for classical and detailed dynamic models of the synchronous generator respectively compared to the sequential modified Euler’s method. The weak scaling efficiency of the Parareal algorithm when required to solve a large number of ordinary differential equations at each time step due to the increase in sequential computations and associated memory transfer latency between the CPU and GPU is discussed.展开更多
In order to improve femtosecond laser throughput,a parallel processing system consisting of liquid crystal on silicon(LCOS)device as spatial light modulator is put forward.A method is described for displaying Fourier ...In order to improve femtosecond laser throughput,a parallel processing system consisting of liquid crystal on silicon(LCOS)device as spatial light modulator is put forward.A method is described for displaying Fourier hologram on LCOS,and a high uniformity of several diffraction peaks in the computer reconstruction is achieved.Application of this method to the parallel femtosecond laser processing is also demonstrated,and two intersecting rings and three tangent rings are fabricated respectively by one time in the photoresist.展开更多
The paper presents the implementation of a parallel version of FDK (Felkamp, David e Kress) algorithm using graphics processing units. Discussion was briefly some elements the computed tomographic scan and FDK algor...The paper presents the implementation of a parallel version of FDK (Felkamp, David e Kress) algorithm using graphics processing units. Discussion was briefly some elements the computed tomographic scan and FDK algorithm; and some ideas about GPUs (Graphics Processing Units) and its use in general purpose computing were presented. The paper shows a computational implementation of FDK algorithm and the process of parallelization of this implementation. Compare the parallel version of the algorithm with the sequential version, used speedup as a performance metric. To evaluate the performance of parallel version, two GPUs, GeForce 9400GT (16 cores) a low capacity GPU and Quadro 2000 (192 cores) a medium capacity GPU was reached speedup of 3.37.展开更多
The Long Term Evolution (LTE) system imposes high requirements for dispatching delay.Moreover,very large air interface rate of LTE requires good processing capability for the devices processing the baseband signals.Co...The Long Term Evolution (LTE) system imposes high requirements for dispatching delay.Moreover,very large air interface rate of LTE requires good processing capability for the devices processing the baseband signals.Consequently,the single-core processor cannot meet the requirements of LTE system.This paper analyzes how to use multi-core processors to achieve parallel processing of uplink demodulation and decoding in LTE systems and designs an approach to parallel processing.The test results prove that this approach works quite well.展开更多
Simulating charged particle motion through the elements is necessary to understand modern particle accelerators. The particle numbers and the circling turns in a synchrotron are huge, and a simulation can be timeconsu...Simulating charged particle motion through the elements is necessary to understand modern particle accelerators. The particle numbers and the circling turns in a synchrotron are huge, and a simulation can be timeconsuming. Open multi-processing(Open MP) is a convenient method to speed up the computing of multi-cores for computers based on share memory model. Using message passing interface(MPI) which is based on nonuniform memory access architecture, a coarse grain parallel algorithm is set up for the Accelerator Toolbox(AT)for dynamic tracking processes. The computing speedup of the tracking process is 3.77 times with a quad-core CPU computer and the speed almost grows linearly with the number of CPU.展开更多
In order to meet the demands of high efficient and real-time computer assisted diagnosis as well as screening in medical area, to improve the efficacy of parallel medical image processing is of great importance. This ...In order to meet the demands of high efficient and real-time computer assisted diagnosis as well as screening in medical area, to improve the efficacy of parallel medical image processing is of great importance. This article proposes improved strategies for parallel medical image processing applications,which is categorized into two genera. For each genus individual strategy is devised, including the theoretic algorithm for minimizing the exertion time. Experiment using mammograms not only justifies the validity of the theoretic analysis, with reasonable difference between the theoretic and measured value, but also shows that when adopting the improved strategies, efficacy of medical image parallel processing is improved greatly.展开更多
基金the National Natural Science Foundation of China under Grant No.61370091 and No.61170200,Jiangsu Province Science and Technology Support Program (industry) Project under Grant No.BE2012179
文摘Along with the increasing Big Data challenges, the MapReduce based systems are extensively welcomed, because of their remarkable simplicity and scalability. However, from the first day MapReduce is proposed, its argument with parallel Dt3MSs never stops, as it over-focuses on the scalability but overlooks the efficiency. Accordingly, extended systems are proposed in order to improve the peDbrmance on the limited scale clusters. In the meantime, traditional RDBMS technologies like structured data model, transaction, SQL, etc. are also getting more attention. This paper reviews such systems, from Google and also the third parties, trying to indicate the directions for the future research.
文摘Large range cell migration is a severe challenge to imaging algorithm for spaceborne SAR. Based on design of Finite Impulse Response (FIR) filter and Range Doppler (RD) algorithm, a realization of quick-look imaging for large range cell migration is proposed. It realized quick-look imaging of 8 times reduced resolution with parallel processing on memory shared 8 CPU SGI server. According to simulation experiment, this quick-look imaging algorithm with parallel processing can image 16384x16384 SAR raw data within 6 seconds. It reaches the requirement of real-time imaging.
文摘This paper deals with a parallel processing uninterruptible power supply (UPS) for sudden voltage fluctuation in power management to integrate power quality improvement, load voltage stabilization and UPS. To reduce the complexity, cost and number of power conversions, which results in higher efficiency, only one voltage-controlled voltage source inverter (VCVSI) is used. The VCVSI is connected in series on the DC battery side and in parallel on the AC grid side with a decoupling inductor. The system provides sinusoidal voltage at the fundamental value of 220V/60Hz for the load during abnormal utility power conditions or grid failure. Also, the system can be operated to mitigate the harmonic current and voltage demand from nonlinear loads and provide voltage stabilization for loads when sudden voltage fluctuation occur, such as sag and swell. The experimental results confirm the system protects against outages caused by abnormal utility power conditions and sudden voltage fluctuations and change.
基金Sponsored by National Natural Science Foundation of China(60572098)
文摘ADSP-TS101 is a high performance DSP with good properties of parallel processing and high speed.According to the real-time processing requirements of underwater acoustic communication algorithms,a real-time parallel processing system with multi-channel synchronous sample,which is composed of multiple ADSP-TS101s,is designed and carried out.For the hardware design,field programmable gate array(FPGA)logical control is adopted for the design of multi-channel synchronous sample module and cluster/data flow associated pin connection mode is adopted for multiprocessing parallel processing configuration respectively.And the software is optimized by two kinds of communication ways:broadcast writing way through shared bus and point-to-point way through link ports.Through the whole system installation,connective debugging,and experiments in a lake,the results show that the real-time parallel processing system has good stability and real-time processing capability and meets the technical design requirements of real-time processing.
基金The authors received no specific funding for this work.
文摘The k-Nearest Neighbor method is one of the most popular techniques for both classification and regression purposes.Because of its operation,the application of this classification may be limited to problems with a certain number of instances,particularly,when run time is a consideration.However,the classification of large amounts of data has become a fundamental task in many real-world applications.It is logical to scale the k-Nearest Neighbor method to large scale datasets.This paper proposes a new k-Nearest Neighbor classification method(KNN-CCL)which uses a parallel centroid-based and hierarchical clustering algorithm to separate the sample of training dataset into multiple parts.The introduced clustering algorithm uses four stages of successive refinements and generates high quality clusters.The k-Nearest Neighbor approach subsequently makes use of them to predict the test datasets.Finally,sets of experiments are conducted on the UCI datasets.The experimental results confirm that the proposed k-Nearest Neighbor classification method performs well with regard to classification accuracy and performance.
文摘In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularly noteworthy in the field of image processing, which witnessed significant advancements. This parallel computing project explored the field of parallel image processing, with a focus on the grayscale conversion of colorful images. Our approach involved integrating OpenMP into our framework for parallelization to execute a critical image processing task: grayscale conversion. By using OpenMP, we strategically enhanced the overall performance of the conversion process by distributing the workload across multiple threads. The primary objectives of our project revolved around optimizing computation time and improving overall efficiency, particularly in the task of grayscale conversion of colorful images. Utilizing OpenMP for concurrent processing across multiple cores significantly reduced execution times through the effective distribution of tasks among these cores. The speedup values for various image sizes highlighted the efficacy of parallel processing, especially for large images. However, a detailed examination revealed a potential decline in parallelization efficiency with an increasing number of cores. This underscored the importance of a carefully optimized parallelization strategy, considering factors like load balancing and minimizing communication overhead. Despite challenges, the overall scalability and efficiency achieved with parallel image processing underscored OpenMP’s effectiveness in accelerating image manipulation tasks.
文摘Real-time capabilities and computational efficiency are provided by parallel image processing utilizing OpenMP. However, race conditions can affect the accuracy and reliability of the outcomes. This paper highlights the importance of addressing race conditions in parallel image processing, specifically focusing on color inverse filtering using OpenMP. We considered three solutions to solve race conditions, each with distinct characteristics: #pragma omp atomic: Protects individual memory operations for fine-grained control. #pragma omp critical: Protects entire code blocks for exclusive access. #pragma omp parallel sections reduction: Employs a reduction clause for safe aggregation of values across threads. Our findings show that the produced images were unaffected by race condition. However, it becomes evident that solving the race conditions in the code makes it significantly faster, especially when it is executed on multiple cores.
文摘Using the method of mathematical morphology,this paper fulfills filtration,segmentation and extraction of morphological features of the satellite cloud image.It also gives out the relative algorithms,which is realized by parallel C programming based on Transputer networks.It has been successfully used to process the typhoon and the low tornado cloud image.And it will be used in weather forecast.
基金This project was supported by the National Natural Science Foundation of China (60135020).
文摘The flexibility of traditional image processing system is limited because those system are designed for specific applications. In this paper, a new TMS320C64x-based multi-DSP parallel computing architecture is presented. It has many promising characteristics such as powerful computing capability, broad I/O bandwidth, topology flexibility, and expansibility. The parallel system performance is evaluated by practical experiment.
基金supported by the Natural Science Foundation of Jiangsu Province of China under Grant No.BK20211284the Financial and Science Technology Plan Project of Xinjiang Production,Construction Corps under Grant No.2020DB005the National Natural Science Foundation of China under Grant Nos.61872219,62002276 and 62177014。
文摘Withthe rapiddevelopment of deep learning,the size of data sets anddeepneuralnetworks(DNNs)models are also booming.As a result,the intolerable long time for models’training or inference with conventional strategies can not meet the satisfaction of modern tasks gradually.Moreover,devices stay idle in the scenario of edge computing(EC),which presents a waste of resources since they can share the pressure of the busy devices but they do not.To address the problem,the strategy leveraging distributed processing has been applied to load computation tasks from a single processor to a group of devices,which results in the acceleration of training or inference of DNN models and promotes the high utilization of devices in edge computing.Compared with existing papers,this paper presents an enlightening and novel review of applying distributed processing with data and model parallelism to improve deep learning tasks in edge computing.Considering the practicalities,commonly used lightweight models in a distributed system are introduced as well.As the key technique,the parallel strategy will be described in detail.Then some typical applications of distributed processing will be analyzed.Finally,the challenges of distributed processing with edge computing will be described.
基金the Knowledge-based Ship-designHyper-integrated Platform (KSHIP) of Ministry ofEducation, China
文摘The parallel processing based on the free running model test was adopted to predict the interaction force coefficients (flow straightening coefficient and wake fraction) of ship maneuvering. And the multipopulation genetic algorithm (MPGA) based on real coding that can contemporarily process the data of free running model and simulation of ship maneuvering was applied to solve the problem. Accordingly the optimal individual was obtained using the method of genetic algorithm. The parallel processing of multiopulation solved the prematurity in the identification for single population, meanwhile, the parallel processing of the data of ship maneuvering (turning motion and zigzag motion) is an attempt to solve the coefficient drift problem. In order to validate the method, the interaction force coefficients were verified by the procedure and these coefficients measured were compared with those ones identified. The maximum error is less than 5%, and the identification is an effective method.
基金The 111 project(B07018) Supported by Program for Changjiang Scholars and Innovative Research Teamin University(IRT0423)
文摘To improve image processing speed and detection precision of a surface detection system on a strip surface,based on the analysis of the characteristics of image data and image processing in detection system on the strip surface,the design of parallel image processing system and the methods of algorithm implementation have been studied. By using field programmable gate array(FPGA) as hardware platform of implementation and considering the characteristic of detection system on the strip surface,a parallel image processing system implemented by using multi IP kernel is designed. According to different computing tasks and the load balancing capability of parallel processing system,the system could set different calculating numbers of nodes to meet the system's demand and save the hardware cost.
文摘An improvement detecting method was proposed according to the disadvantages of testing method of optical axes parallelism of shipboard photoelectrical theodolite (short for theodolite) based on image processing. Pointolite replaced 0.2'' collimator to reduce the errors of crosshair images processing and improve the quality of image. What’s more, the high quality images could help to optimize the image processing method and the testing accuracy. The errors between the trial results interpreted by software and the results tested in dock were less than 10'', which indicated the improve method had some actual application values.
文摘On-line transient stability analysis of a power grid is crucial in determining whether the power grid will traverse to a steady state stable operating point after a disturbance. The transient stability analysis involves computing the solutions of the algebraic equations modeling the grid network and the ordinary differential equations modeling the dynamics of the electrical components like synchronous generators, exciters, governors, etc., of the grid in near real-time. In this research, we investigate the use of time-parallel approach in particular the Parareal algorithm implementation on Graphical Processing Unit using Compute Unified Device Architecture to compute solutions of ordinary differential equations. The numerical solution accuracy and computation time of the Parareal algorithm executing on the GPU are demonstrated on the single machine infinite bus test system. Two types of dynamic model of the single synchronous generator namely the classical and detailed models are studied. The numerical solutions of the ordinary differential equations computed by the Parareal algorithm are compared to that computed using the modified Euler’s method demonstrating the accuracy of the Parareal algorithm executing on GPU. Simulations are performed with varying numerical integration time steps, and the suitability of Parareal algorithm in computing near real-time solutions of ordinary different equations is presented. A speedup of 25× and 31× is achieved with the Parareal algorithm for classical and detailed dynamic models of the synchronous generator respectively compared to the sequential modified Euler’s method. The weak scaling efficiency of the Parareal algorithm when required to solve a large number of ordinary differential equations at each time step due to the increase in sequential computations and associated memory transfer latency between the CPU and GPU is discussed.
基金National Natural Science Foundation of China(No.51275502)Natural Science Key Project of Anhui Province(No.KJ2011A014)+1 种基金China Postdoctoral Science Foundation funded project(NO.2012M511416)The Innovation Foundationof Anhui University and the Personnel Construction Project of Anhui University
文摘In order to improve femtosecond laser throughput,a parallel processing system consisting of liquid crystal on silicon(LCOS)device as spatial light modulator is put forward.A method is described for displaying Fourier hologram on LCOS,and a high uniformity of several diffraction peaks in the computer reconstruction is achieved.Application of this method to the parallel femtosecond laser processing is also demonstrated,and two intersecting rings and three tangent rings are fabricated respectively by one time in the photoresist.
文摘The paper presents the implementation of a parallel version of FDK (Felkamp, David e Kress) algorithm using graphics processing units. Discussion was briefly some elements the computed tomographic scan and FDK algorithm; and some ideas about GPUs (Graphics Processing Units) and its use in general purpose computing were presented. The paper shows a computational implementation of FDK algorithm and the process of parallelization of this implementation. Compare the parallel version of the algorithm with the sequential version, used speedup as a performance metric. To evaluate the performance of parallel version, two GPUs, GeForce 9400GT (16 cores) a low capacity GPU and Quadro 2000 (192 cores) a medium capacity GPU was reached speedup of 3.37.
文摘The Long Term Evolution (LTE) system imposes high requirements for dispatching delay.Moreover,very large air interface rate of LTE requires good processing capability for the devices processing the baseband signals.Consequently,the single-core processor cannot meet the requirements of LTE system.This paper analyzes how to use multi-core processors to achieve parallel processing of uplink demodulation and decoding in LTE systems and designs an approach to parallel processing.The test results prove that this approach works quite well.
基金Supported by the National Natural Science Foundation of China(No11105214)
文摘Simulating charged particle motion through the elements is necessary to understand modern particle accelerators. The particle numbers and the circling turns in a synchrotron are huge, and a simulation can be timeconsuming. Open multi-processing(Open MP) is a convenient method to speed up the computing of multi-cores for computers based on share memory model. Using message passing interface(MPI) which is based on nonuniform memory access architecture, a coarse grain parallel algorithm is set up for the Accelerator Toolbox(AT)for dynamic tracking processes. The computing speedup of the tracking process is 3.77 times with a quad-core CPU computer and the speed almost grows linearly with the number of CPU.
基金SEC E-Institute:Shanghai High Institutions Grid Project, National Natural Science Foundation of ChinaGrant number: No.60503039,10778604 and 60773148+1 种基金China’s National Fundamenfal Research 973 ProgramGrant number:2004CB217903
文摘In order to meet the demands of high efficient and real-time computer assisted diagnosis as well as screening in medical area, to improve the efficacy of parallel medical image processing is of great importance. This article proposes improved strategies for parallel medical image processing applications,which is categorized into two genera. For each genus individual strategy is devised, including the theoretic algorithm for minimizing the exertion time. Experiment using mammograms not only justifies the validity of the theoretic analysis, with reasonable difference between the theoretic and measured value, but also shows that when adopting the improved strategies, efficacy of medical image parallel processing is improved greatly.