The implicit 2D3V particle-in-cell(PIC)code developed to study the interaction of ultrashort pulse lasers with matter[G.M.Petrov and J.Davis,Computer Phys.Comm.179,868(2008);Phys.Plasmas 18,073102(2011)]has been paral...The implicit 2D3V particle-in-cell(PIC)code developed to study the interaction of ultrashort pulse lasers with matter[G.M.Petrov and J.Davis,Computer Phys.Comm.179,868(2008);Phys.Plasmas 18,073102(2011)]has been parallelized using MPI(Message Passing Interface).The parallelization strategy is optimized for a small number of computer cores,up to about 64.Details on the algorithm implementation are given with emphasis on code optimization by overlapping computations with communications.Performance evaluation for 1D domain decomposition has been made on a small Linux cluster with 64 computer cores for two typical regimes of PIC operation:”particle dominated”,for which the bulk of the computation time is spent on pushing particles,and”field dominated”,for which computing the fields is prevalent.For a small number of computer cores,less than 32,the MPI implementation offers a significant numerical speed-up.In the”particle dominated”regime it is close to the maximum theoretical one,while in the”field dominated”regime it is about 75-80%of the maximum speed-up.For a number of cores exceeding 32,performance degradation takes place as a result of the adopted 1D domain decomposition.The code parallelization will allow future implementation of atomic physics and extension to three dimensions.展开更多
文摘The implicit 2D3V particle-in-cell(PIC)code developed to study the interaction of ultrashort pulse lasers with matter[G.M.Petrov and J.Davis,Computer Phys.Comm.179,868(2008);Phys.Plasmas 18,073102(2011)]has been parallelized using MPI(Message Passing Interface).The parallelization strategy is optimized for a small number of computer cores,up to about 64.Details on the algorithm implementation are given with emphasis on code optimization by overlapping computations with communications.Performance evaluation for 1D domain decomposition has been made on a small Linux cluster with 64 computer cores for two typical regimes of PIC operation:”particle dominated”,for which the bulk of the computation time is spent on pushing particles,and”field dominated”,for which computing the fields is prevalent.For a small number of computer cores,less than 32,the MPI implementation offers a significant numerical speed-up.In the”particle dominated”regime it is close to the maximum theoretical one,while in the”field dominated”regime it is about 75-80%of the maximum speed-up.For a number of cores exceeding 32,performance degradation takes place as a result of the adopted 1D domain decomposition.The code parallelization will allow future implementation of atomic physics and extension to three dimensions.