We implement a parallel algorithm with the advantage of MPI (Message Passing Interface) to speed up the rapid relaxation inversion for 3D magnetotelluric data. We test the parallel rapid relaxation algorithm with sy...We implement a parallel algorithm with the advantage of MPI (Message Passing Interface) to speed up the rapid relaxation inversion for 3D magnetotelluric data. We test the parallel rapid relaxation algorithm with synthetic and real data. The execution efficiency of the algorithm for several different situations is also compared. The results indicate that the parallel rapid relaxation algorithm for 3D magnetotelluric inversion is effective. This parallel algorithm implemented on a common PC promotes the practical application of 3D magnetotelluric inversion and can be suitable for the other geophysical 3D modeling and inversion.展开更多
An efficient parallel global router using random optimization that is independent of net ordering is proposed.Parallel approaches are described and strategies guaranteeing the routing quality are discussed.The wire le...An efficient parallel global router using random optimization that is independent of net ordering is proposed.Parallel approaches are described and strategies guaranteeing the routing quality are discussed.The wire length model is implemented on multiprocessor,which enables the algorithm to approach feasibility of large scale problems.Timing driven model on multiprocessor and wire length model on distributed processors are also presented.The parallel algorithm greatly reduces the run time of routing.The experimental results show good speedups with no degradation of the routing quality.展开更多
With the development of parallel computing technology,non-linear inversion calculation efficiency has been improving.However,for single-point search-based non-linear inversion methods,the implementation of parallel al...With the development of parallel computing technology,non-linear inversion calculation efficiency has been improving.However,for single-point search-based non-linear inversion methods,the implementation of parallel algorithms is a difficult issue.We introduce the idea of group search to the single-point search-based non-linear inversion algorithm, taking the quantum Monte Carlo method as an example for two-dimensional seismic wave velocity inversion and practical impedance inversion and test the calculation efficiency of using different node numbers.The results show the parallel algorithm in theoretical and practical data inversion is feasible and effective.The parallel algorithm has good versatility. The algorithm efficiency increases with increasing node numbers but the algorithm efficiency rate of increase gradually decreases as the node numbers increase.展开更多
In light of the high nonlinearity of LuGre friction model, a novel method based on ant colony algorithm(ACA) for identifying the friction parameters of flight simulation servo system is proposed. ACA is a parallelized...In light of the high nonlinearity of LuGre friction model, a novel method based on ant colony algorithm(ACA) for identifying the friction parameters of flight simulation servo system is proposed. ACA is a parallelized bionic optimization algorithm inspired from the behavior of real ants, and a kind of positive feedback mechanism is adopted in ACA. On the basis of brief introduction of LuGre friction model, a method for identifying the static LuGre friction parameters and the dynamic LuGre friction parameters using ACA is derived. Finally, this new friction parameter identification scheme is applied to a electric-driven flight simulation servo system with high precision. Simulation and application results verify the feasibility and the effectiveness of the scheme. It provides a new way to identify the friction parameters of LuGre model.展开更多
A parallel virtual machine (PVM) protocol based parallel computation of 3-D hypersonic flows with chemical non-equilibrium on hybrid meshes is presented. The numerical simulation for hypersonic flows with chemical n...A parallel virtual machine (PVM) protocol based parallel computation of 3-D hypersonic flows with chemical non-equilibrium on hybrid meshes is presented. The numerical simulation for hypersonic flows with chemical non-equilibrium reactions encounters the stiffness problem, thus taking huge CPU time. Based on the domain decomposition method, a high efficient automatic domain decomposer for three-dimensional hybrid meshes is developed, and then implemented to the numerical simulation of hypersonic flows. Control equations are multicomponent N-S equations, and spatially discretized scheme is used by a cell-centered finite volume algorithm with a five-stage Runge-Kutta time step. The chemical kinetic model is a seven species model with weak ionization. A point-implicit method is used to solve the chemical source term. Numerical results on PC-Cluster are verified on a bi-ellipse model compared with references.展开更多
A parallel embedding overlapped iterative (EOI) algorithm about classicimplicit equations with asymmetric Saul'yev schemes (CIS-EOI) to solve one-dimensional diffusionequations is discussed to improve the properti...A parallel embedding overlapped iterative (EOI) algorithm about classicimplicit equations with asymmetric Saul'yev schemes (CIS-EOI) to solve one-dimensional diffusionequations is discussed to improve the properties of the segment classic implicit iterative (SCII)algorithm. The structure of CIS-EOI method is given and the stability of scheme and convergence ofiteration are proved by matrix method. The property of gradual-approach convergence is alsodiscussed. It has been shown that the convergent rate is faster and the property of gradual-approachconvergence also becomes better with the increasing of the net point in subsystems than with theSCII algorithm. The simulation examples show that the parallel iterative algorithm with a differentinsertion scheme CIS-EOI is more effective.展开更多
A parallelized upwind flux splitting scheme for supersonic reacting flows on hybrid meshes is presented. The complexity of super/hyper-sonic combustion flows makes it necessary to establish solvers with higher resolut...A parallelized upwind flux splitting scheme for supersonic reacting flows on hybrid meshes is presented. The complexity of super/hyper-sonic combustion flows makes it necessary to establish solvers with higher resolution and efficiency for multi-component Euler/N-S equations. Hence, a spatial second-order van Leer type flux vector splitting scheme is established by introducing auxiliary points in interpolation, and a domain decomposition method used on unstructured hybrid meshes for obtaining high calculating efficiency. The numerical scheme with five-stage Runge-Kutta time step method is implemented to the simulation of combustion flows, including the supersonic hydrogen/air combustion and the normal injection of hydrogen into reacting flows. Satisfying results are obtained compared with limited references.展开更多
In this paper an attempt of employing network resources to solve a complex and time-consuming problem is presented. The global illumination problem is selected as the study objective. An improved density estimation al...In this paper an attempt of employing network resources to solve a complex and time-consuming problem is presented. The global illumination problem is selected as the study objective. An improved density estimation algorithm is first developed, in which the more inherent concurrency is explored. Then its parallel implementation by using a PVM mechanism and the running performance analysis are provided. The analysis results show the expected speed-up obtained and demonstrate that the PVM has good application prospects for parallel computation in a distributed network.展开更多
Traditional two-dimensional(2D) complex resistivity forward modeling is based on Poisson's equation but spectral induced polarization(SIP) data are the coproducts of the induced polarization(IP) and the electro...Traditional two-dimensional(2D) complex resistivity forward modeling is based on Poisson's equation but spectral induced polarization(SIP) data are the coproducts of the induced polarization(IP) and the electromagnetic induction(EMI) effects.This is especially true under high frequencies,where the EMI effect can exceed the IP effect.2D inversion that only considers the IP effect reduces the reliability of the inversion data.In this paper,we derive differential equations using Maxwell's equations.With the introduction of the Cole-Cole model,we use the finite-element method to conduct2 D SIP forward modeling that considers the EMI and IP effects simultaneously.The data-space Occam method,in which different constraints to the model smoothness and parametric boundaries are introduced,is then used to simultaneously obtain the four parameters of the Cole-Cole model using multi-array electric field data.This approach not only improves the stability of the inversion but also significantly reduces the solution ambiguity.To improve the computational efficiency,message passing interface programming was used to accelerate the 2D SIP forward modeling and inversion.Synthetic datasets were tested using both serial and parallel algorithms,and the tests suggest that the proposed parallel algorithm is robust and efficient.展开更多
Local mesh refinement is one of the key steps in the implementations of adaptive finite element methods. This paper presents a parallel algorithm for distributed memory parallel computers for adaptive local refinement...Local mesh refinement is one of the key steps in the implementations of adaptive finite element methods. This paper presents a parallel algorithm for distributed memory parallel computers for adaptive local refinement of tetrahedral meshes using bisection. This algorithm is used in PHG, Parallel Hierarchical Grid Chttp://lsec. cc. ac. cn/phg/), a toolbox under active development for parallel adaptive finite element solutions of partial differential equations. The algorithm proposed is characterized by allowing simukaneous refinement of submeshes to arbitrary levels before synchronization between submeshes and without the need of a central coordinator process for managing new vertices. Using the concept of canonical refinement, a simple proof of the independence of the resulting mesh on the mesh partitioning is given, which is useful in better understanding the behaviour of the biseetioning refinement procedure.展开更多
This work is aimed at investigating the online scheduling problem on two parallel and identical machines with a new feature that service requests from various customers are entitled to many different grade of service ...This work is aimed at investigating the online scheduling problem on two parallel and identical machines with a new feature that service requests from various customers are entitled to many different grade of service (GoS) levels, so each job and machine are labelled with the GoS levels, and each job can be processed by a particular machine only when its GoS level is no less than that of the machine. The goal is to minimize the makespan. For non-preemptive version, we propose an optimal online al-gorithm with competitive ratio 5/3. For preemptive version, we propose an optimal online algorithm with competitive ratio 3/2.展开更多
In this work, we treat scattering objects, water, surface and bottom in a truly unified manner in a parallel finitedifference time-domain (FDTD) scheme, which is suitable for distributed parallel computing in a mess...In this work, we treat scattering objects, water, surface and bottom in a truly unified manner in a parallel finitedifference time-domain (FDTD) scheme, which is suitable for distributed parallel computing in a message passing interface (MPI) programming environment. The algorithm is implemented on a cluster-based high performance computer system. Parallel computation is performed with different division methods in 2D and 3D situations. Based on analysis of main factors affecting the speedup rate and parallel efficiency, data communication is reduced by selecting a suitable scheme of task division. A desirable scheme is recommended, giving a higher speedup rate and better efficiency. The results indicate that the unified parallel FDTD algorithm provides a solution to the numerical computation of acoustic scattering.展开更多
As well as shock wave and bubble pulse loading, cavitation also has very significant influences on the dynamic response of surface ships and other near-surface marine structures to underwater explosive loadings. In th...As well as shock wave and bubble pulse loading, cavitation also has very significant influences on the dynamic response of surface ships and other near-surface marine structures to underwater explosive loadings. In this paper, the acoustic-structure coupling method embedded in ABAQUS is adopted to do numerical analysis of underwater explosion considering cavitation. Both the shape of bulk cavitation region and local cavitation region are obtained, and they are in good agreement with analytical results. The duration of reloading is several times longer than that of a shock wave. In the end, both the single computation and parallel computation of the cavitation effect on the dynamic responses of a full-scale ship are presented, which proved that reloading caused by cavitation is non-ignorable. All these results are helpful in understanding underwater explosion cavitation effects.展开更多
To improve the inversion accuracy of time-domain airborne electromagnetic data, we propose a parallel 3D inversion algorithm for airborne EM data based on the direct Gauss-Newton optimization. Forward modeling is perf...To improve the inversion accuracy of time-domain airborne electromagnetic data, we propose a parallel 3D inversion algorithm for airborne EM data based on the direct Gauss-Newton optimization. Forward modeling is performed in the frequency domain based on the scattered secondary electrical field. Then, the inverse Fourier transform and convolution of the transmitting waveform are used to calculate the EM responses and the sensitivity matrix in the time domain for arbitrary transmitting waves. To optimize the computational time and memory requirements, we use the EM "footprint" concept to reduce the model size and obtain the sparse sensitivity matrix. To improve the 3D inversion, we use the OpenMP library and parallel computing. We test the proposed 3D parallel inversion code using two synthetic datasets and a field dataset. The time-domain airborne EM inversion results suggest that the proposed algorithm is effective, efficient, and practical.展开更多
An OpenMP approach was proposed to parallelize the sequential molecular dynamics(MD) code on shared memory machines. When a code is converted from the sequential form to the parallel form, data dependence is a main pr...An OpenMP approach was proposed to parallelize the sequential molecular dynamics(MD) code on shared memory machines. When a code is converted from the sequential form to the parallel form, data dependence is a main problem. A traditional sequential molecular dynamics code is anatomized to find the data dependence segments in it, and the two different methods, i.e., recover method and backward mapping method were used to eliminate those data dependencies in order to realize the parallelization of this sequential MD code. The performance of the parallelized MD code was analyzed by using some performance analysis tools. The results of the test show that the computing size of this code increases sharply form 1 million atoms before parallelization to 20 million atoms after parallelization, and the wall clock during computing is reduced largely. Some hot-spots in this code are found and optimized by improved algorithm. The efficiency of parallel computing is 30% higher than that of before, and the calculation time is saved and larger scale calculation problems are solved.展开更多
This paper firstly introduces the structure and working principle of DSP-based parallel system, parallel accelerating board and SHARC DSP chip. Then it pays attention to investigating the system’s programming charact...This paper firstly introduces the structure and working principle of DSP-based parallel system, parallel accelerating board and SHARC DSP chip. Then it pays attention to investigating the system’s programming characteristics, especially the mode of communication, discussing how to design parallel algorithms and presenting a domain-decomposition-based complete multi-grid parallel algorithm with virtual boundary forecast (VBF) to solve a lot of large-scale and complicated heat problems. In the end, Mandelbrot Set and a non-linear heat transfer equation of ceramic/metal composite material are taken as examples to illustrate the implementation of the proposed algorithm. The results showed that the solutions are highly efficient and have linear speedup.展开更多
Map data display is the basic information representation mode under embedded real-time navigation. After a navigation display data set (NDIS_SET) with several dimensions and corresponding mathematical description fo...Map data display is the basic information representation mode under embedded real-time navigation. After a navigation display data set (NDIS_SET) with several dimensions and corresponding mathematical description formula are designed, a series of rules and algorithms are advanced to optimize embedded navigation data and promote data index and input efficiency. A new parallel display algorithm with navigation data named N PDIS is then presented to adapt to limited embedded resources of computation and memory after a normal navigation data display algorithm named NDIS and related problems are analyzed, N_PDIS can synchronously create two preparative bitmapa by two parallel threads and switch one of them to screen automatically. Compared with NDIS, the results show that N_PDIS is more effective in improving display efficiency.展开更多
基金sponsored by National Natural Science Foundation of China(Grant No.40774029,40374024)the National Hi-tech Rsearch and Development Program of China(863 Program)(No.2007AA09Z310,)the Program for New Century Excellent Talents in University(NCET)
文摘We implement a parallel algorithm with the advantage of MPI (Message Passing Interface) to speed up the rapid relaxation inversion for 3D magnetotelluric data. We test the parallel rapid relaxation algorithm with synthetic and real data. The execution efficiency of the algorithm for several different situations is also compared. The results indicate that the parallel rapid relaxation algorithm for 3D magnetotelluric inversion is effective. This parallel algorithm implemented on a common PC promotes the practical application of 3D magnetotelluric inversion and can be suitable for the other geophysical 3D modeling and inversion.
文摘An efficient parallel global router using random optimization that is independent of net ordering is proposed.Parallel approaches are described and strategies guaranteeing the routing quality are discussed.The wire length model is implemented on multiprocessor,which enables the algorithm to approach feasibility of large scale problems.Timing driven model on multiprocessor and wire length model on distributed processors are also presented.The parallel algorithm greatly reduces the run time of routing.The experimental results show good speedups with no degradation of the routing quality.
基金supported by National Key S&T Special Projects of Marine Carbonate(No.2008ZX05000-004)CNPC Projects(No.2008E-0610-10)
文摘With the development of parallel computing technology,non-linear inversion calculation efficiency has been improving.However,for single-point search-based non-linear inversion methods,the implementation of parallel algorithms is a difficult issue.We introduce the idea of group search to the single-point search-based non-linear inversion algorithm, taking the quantum Monte Carlo method as an example for two-dimensional seismic wave velocity inversion and practical impedance inversion and test the calculation efficiency of using different node numbers.The results show the parallel algorithm in theoretical and practical data inversion is feasible and effective.The parallel algorithm has good versatility. The algorithm efficiency increases with increasing node numbers but the algorithm efficiency rate of increase gradually decreases as the node numbers increase.
文摘In light of the high nonlinearity of LuGre friction model, a novel method based on ant colony algorithm(ACA) for identifying the friction parameters of flight simulation servo system is proposed. ACA is a parallelized bionic optimization algorithm inspired from the behavior of real ants, and a kind of positive feedback mechanism is adopted in ACA. On the basis of brief introduction of LuGre friction model, a method for identifying the static LuGre friction parameters and the dynamic LuGre friction parameters using ACA is derived. Finally, this new friction parameter identification scheme is applied to a electric-driven flight simulation servo system with high precision. Simulation and application results verify the feasibility and the effectiveness of the scheme. It provides a new way to identify the friction parameters of LuGre model.
文摘A parallel virtual machine (PVM) protocol based parallel computation of 3-D hypersonic flows with chemical non-equilibrium on hybrid meshes is presented. The numerical simulation for hypersonic flows with chemical non-equilibrium reactions encounters the stiffness problem, thus taking huge CPU time. Based on the domain decomposition method, a high efficient automatic domain decomposer for three-dimensional hybrid meshes is developed, and then implemented to the numerical simulation of hypersonic flows. Control equations are multicomponent N-S equations, and spatially discretized scheme is used by a cell-centered finite volume algorithm with a five-stage Runge-Kutta time step. The chemical kinetic model is a seven species model with weak ionization. A point-implicit method is used to solve the chemical source term. Numerical results on PC-Cluster are verified on a bi-ellipse model compared with references.
文摘A parallel embedding overlapped iterative (EOI) algorithm about classicimplicit equations with asymmetric Saul'yev schemes (CIS-EOI) to solve one-dimensional diffusionequations is discussed to improve the properties of the segment classic implicit iterative (SCII)algorithm. The structure of CIS-EOI method is given and the stability of scheme and convergence ofiteration are proved by matrix method. The property of gradual-approach convergence is alsodiscussed. It has been shown that the convergent rate is faster and the property of gradual-approachconvergence also becomes better with the increasing of the net point in subsystems than with theSCII algorithm. The simulation examples show that the parallel iterative algorithm with a differentinsertion scheme CIS-EOI is more effective.
文摘A parallelized upwind flux splitting scheme for supersonic reacting flows on hybrid meshes is presented. The complexity of super/hyper-sonic combustion flows makes it necessary to establish solvers with higher resolution and efficiency for multi-component Euler/N-S equations. Hence, a spatial second-order van Leer type flux vector splitting scheme is established by introducing auxiliary points in interpolation, and a domain decomposition method used on unstructured hybrid meshes for obtaining high calculating efficiency. The numerical scheme with five-stage Runge-Kutta time step method is implemented to the simulation of combustion flows, including the supersonic hydrogen/air combustion and the normal injection of hydrogen into reacting flows. Satisfying results are obtained compared with limited references.
文摘In this paper an attempt of employing network resources to solve a complex and time-consuming problem is presented. The global illumination problem is selected as the study objective. An improved density estimation algorithm is first developed, in which the more inherent concurrency is explored. Then its parallel implementation by using a PVM mechanism and the running performance analysis are provided. The analysis results show the expected speed-up obtained and demonstrate that the PVM has good application prospects for parallel computation in a distributed network.
基金jointly sponsored by the National Natural Science Foundation of China(Grant No.41374078)the Geological Survey Projects of the Ministry of Land and Resources of China(Grant Nos.12120113086100 and 12120113101300)Beijing Higher Education Young Elite Teacher Project
文摘Traditional two-dimensional(2D) complex resistivity forward modeling is based on Poisson's equation but spectral induced polarization(SIP) data are the coproducts of the induced polarization(IP) and the electromagnetic induction(EMI) effects.This is especially true under high frequencies,where the EMI effect can exceed the IP effect.2D inversion that only considers the IP effect reduces the reliability of the inversion data.In this paper,we derive differential equations using Maxwell's equations.With the introduction of the Cole-Cole model,we use the finite-element method to conduct2 D SIP forward modeling that considers the EMI and IP effects simultaneously.The data-space Occam method,in which different constraints to the model smoothness and parametric boundaries are introduced,is then used to simultaneously obtain the four parameters of the Cole-Cole model using multi-array electric field data.This approach not only improves the stability of the inversion but also significantly reduces the solution ambiguity.To improve the computational efficiency,message passing interface programming was used to accelerate the 2D SIP forward modeling and inversion.Synthetic datasets were tested using both serial and parallel algorithms,and the tests suggest that the proposed parallel algorithm is robust and efficient.
基金supported by the 973 Program of China 2005CB321702China NSF 10531080.
文摘Local mesh refinement is one of the key steps in the implementations of adaptive finite element methods. This paper presents a parallel algorithm for distributed memory parallel computers for adaptive local refinement of tetrahedral meshes using bisection. This algorithm is used in PHG, Parallel Hierarchical Grid Chttp://lsec. cc. ac. cn/phg/), a toolbox under active development for parallel adaptive finite element solutions of partial differential equations. The algorithm proposed is characterized by allowing simukaneous refinement of submeshes to arbitrary levels before synchronization between submeshes and without the need of a central coordinator process for managing new vertices. Using the concept of canonical refinement, a simple proof of the independence of the resulting mesh on the mesh partitioning is given, which is useful in better understanding the behaviour of the biseetioning refinement procedure.
基金Project supported by the National Natural Science Foundation of China (No. 10271110) and the Teaching and Research Award Pro-gram for Outstanding Young Teachers in Higher Education, Institu-tions of MOE, China
文摘This work is aimed at investigating the online scheduling problem on two parallel and identical machines with a new feature that service requests from various customers are entitled to many different grade of service (GoS) levels, so each job and machine are labelled with the GoS levels, and each job can be processed by a particular machine only when its GoS level is no less than that of the machine. The goal is to minimize the makespan. For non-preemptive version, we propose an optimal online al-gorithm with competitive ratio 5/3. For preemptive version, we propose an optimal online algorithm with competitive ratio 3/2.
基金Project supported by the National Defense Laboratory Foundation (Grant No.51444020103QT0601)the Shanghai Leading Academic Discipline Project (Grant No.T0102)
文摘In this work, we treat scattering objects, water, surface and bottom in a truly unified manner in a parallel finitedifference time-domain (FDTD) scheme, which is suitable for distributed parallel computing in a message passing interface (MPI) programming environment. The algorithm is implemented on a cluster-based high performance computer system. Parallel computation is performed with different division methods in 2D and 3D situations. Based on analysis of main factors affecting the speedup rate and parallel efficiency, data communication is reduced by selecting a suitable scheme of task division. A desirable scheme is recommended, giving a higher speedup rate and better efficiency. The results indicate that the unified parallel FDTD algorithm provides a solution to the numerical computation of acoustic scattering.
基金Foundation item:Supported by the National Natural Science Foundation of China (Grant No. 50921001), National Key Basic Research Special Foundation of China (Grant No. 2010CB832704), Scientific Project for High-tech Ships: Key Technical Research on the Semi-planning Hybrid Fore-body Trimaran, Doctoral Research Foundation of Liaoning Province (Grant No. 20091012).
文摘As well as shock wave and bubble pulse loading, cavitation also has very significant influences on the dynamic response of surface ships and other near-surface marine structures to underwater explosive loadings. In this paper, the acoustic-structure coupling method embedded in ABAQUS is adopted to do numerical analysis of underwater explosion considering cavitation. Both the shape of bulk cavitation region and local cavitation region are obtained, and they are in good agreement with analytical results. The duration of reloading is several times longer than that of a shock wave. In the end, both the single computation and parallel computation of the cavitation effect on the dynamic responses of a full-scale ship are presented, which proved that reloading caused by cavitation is non-ignorable. All these results are helpful in understanding underwater explosion cavitation effects.
基金supported by the Key Natural Science Foundation(No.41530320)Natural Science Foundation(No.41274121)+1 种基金Natural Science Foundation for young scientist(No.41404093)the Projects on the Development of the Key Equipment of Chinese Academy of Science(No.ZDYZ2012-1-03)
文摘To improve the inversion accuracy of time-domain airborne electromagnetic data, we propose a parallel 3D inversion algorithm for airborne EM data based on the direct Gauss-Newton optimization. Forward modeling is performed in the frequency domain based on the scattered secondary electrical field. Then, the inverse Fourier transform and convolution of the transmitting waveform are used to calculate the EM responses and the sensitivity matrix in the time domain for arbitrary transmitting waves. To optimize the computational time and memory requirements, we use the EM "footprint" concept to reduce the model size and obtain the sparse sensitivity matrix. To improve the 3D inversion, we use the OpenMP library and parallel computing. We test the proposed 3D parallel inversion code using two synthetic datasets and a field dataset. The time-domain airborne EM inversion results suggest that the proposed algorithm is effective, efficient, and practical.
基金Project (50371026) supported by the National Natural Science Foundation of China
文摘An OpenMP approach was proposed to parallelize the sequential molecular dynamics(MD) code on shared memory machines. When a code is converted from the sequential form to the parallel form, data dependence is a main problem. A traditional sequential molecular dynamics code is anatomized to find the data dependence segments in it, and the two different methods, i.e., recover method and backward mapping method were used to eliminate those data dependencies in order to realize the parallelization of this sequential MD code. The performance of the parallelized MD code was analyzed by using some performance analysis tools. The results of the test show that the computing size of this code increases sharply form 1 million atoms before parallelization to 20 million atoms after parallelization, and the wall clock during computing is reduced largely. Some hot-spots in this code are found and optimized by improved algorithm. The efficiency of parallel computing is 30% higher than that of before, and the calculation time is saved and larger scale calculation problems are solved.
基金Project (No. 60173046) supported by the National Natural ScienceFoundation of China
文摘This paper firstly introduces the structure and working principle of DSP-based parallel system, parallel accelerating board and SHARC DSP chip. Then it pays attention to investigating the system’s programming characteristics, especially the mode of communication, discussing how to design parallel algorithms and presenting a domain-decomposition-based complete multi-grid parallel algorithm with virtual boundary forecast (VBF) to solve a lot of large-scale and complicated heat problems. In the end, Mandelbrot Set and a non-linear heat transfer equation of ceramic/metal composite material are taken as examples to illustrate the implementation of the proposed algorithm. The results showed that the solutions are highly efficient and have linear speedup.
文摘Map data display is the basic information representation mode under embedded real-time navigation. After a navigation display data set (NDIS_SET) with several dimensions and corresponding mathematical description formula are designed, a series of rules and algorithms are advanced to optimize embedded navigation data and promote data index and input efficiency. A new parallel display algorithm with navigation data named N PDIS is then presented to adapt to limited embedded resources of computation and memory after a normal navigation data display algorithm named NDIS and related problems are analyzed, N_PDIS can synchronously create two preparative bitmapa by two parallel threads and switch one of them to screen automatically. Compared with NDIS, the results show that N_PDIS is more effective in improving display efficiency.