The availability of computers and communication networks allows us to gather and analyse data on a far larger scale than previously. At present, it is believed that statistics is a suitable method to analyse networks ...The availability of computers and communication networks allows us to gather and analyse data on a far larger scale than previously. At present, it is believed that statistics is a suitable method to analyse networks with millions, or more, of vertices. The MATLAB language, with its mass of statistical functions, is a good choice to rapidly realize an algorithm prototype of complex networks. The performance of the MATLAB codes can be further improved by using graphic processor units (GPU). This paper presents the strategies and performance of the GPU implementation of a complex networks package, and the Jacket toolbox of MATLAB is used. Compared with some commercially available CPU implementations, GPU can achieve a speedup of, on average, 11.3x. The experimental result proves that the GPU platform combined with the MATLAB language is a good combination for complex network research.展开更多
Low-Density Parity-Check (LDPC) codes are powerful error correcting codes. LDPC decoders have been implemented as efficient error correction codes on dedicated VLSI hardware architectures in recent years. This paper...Low-Density Parity-Check (LDPC) codes are powerful error correcting codes. LDPC decoders have been implemented as efficient error correction codes on dedicated VLSI hardware architectures in recent years. This paper describes two strategies to parallelize min-sum decoding of irregular LDPC codes. The first implements min-sum LDPC decoders on multicore platforms using OpenMP, while the other uses the Compute Unified Device Architecture (CUDA) to parallelize LDPC decoding on Graphics Processing Units (GPUs). Empirical studies on data with various scales show that the performance of these decoding processes is improved by these parallel strategies and the GPUs provide more efficient, fast implementation decoder.展开更多
With the unstructured grid, the Finite Volume Coastal Ocean Model(FVCOM) is converted from its original FORTRAN code to a Compute Unified Device Architecture(CUDA) C code, and optimized on the Graphic Processor U...With the unstructured grid, the Finite Volume Coastal Ocean Model(FVCOM) is converted from its original FORTRAN code to a Compute Unified Device Architecture(CUDA) C code, and optimized on the Graphic Processor Unit(GPU). The proposed GPU-FVCOM is tested against analytical solutions for two standard cases in a rectangular basin, a tide induced flow and a wind induced circulation. It is then applied to the Ningbo's coastal water area to simulate the tidal motion and analyze the flow field and the vertical tide velocity structure. The simulation results agree with the measured data quite well. The accelerated performance of the proposed 3-D model reaches 30 times of that of a single thread program, and the GPU-FVCOM implemented on a Tesla k20 device is faster than on a workstation with 20 CPU cores, which shows that the GPU-FVCOM is efficient for solving large scale sea area and high resolution engineering problems.展开更多
基金Project supported by the Science Fund for Creative Research Groups of the National Natural Science Foundation of China (Grant No.60921062)the National Natural Science Foundation of China (Grant No.60873014)the Young Scientists Fund of the National Natural Science Foundation of China (Grant Nos.61003082 and 60903059)
文摘The availability of computers and communication networks allows us to gather and analyse data on a far larger scale than previously. At present, it is believed that statistics is a suitable method to analyse networks with millions, or more, of vertices. The MATLAB language, with its mass of statistical functions, is a good choice to rapidly realize an algorithm prototype of complex networks. The performance of the MATLAB codes can be further improved by using graphic processor units (GPU). This paper presents the strategies and performance of the GPU implementation of a complex networks package, and the Jacket toolbox of MATLAB is used. Compared with some commercially available CPU implementations, GPU can achieve a speedup of, on average, 11.3x. The experimental result proves that the GPU platform combined with the MATLAB language is a good combination for complex network research.
基金Agilent Technology Foundation(No.912-CHN09)National Natural Science Foundation of China(No.61175110)+1 种基金National Key Basic Research and Development(973)Program of China(No.2012CB316305)National Key Projects of Science and Technology of China(No.2011ZX02101-004)
文摘Low-Density Parity-Check (LDPC) codes are powerful error correcting codes. LDPC decoders have been implemented as efficient error correction codes on dedicated VLSI hardware architectures in recent years. This paper describes two strategies to parallelize min-sum decoding of irregular LDPC codes. The first implements min-sum LDPC decoders on multicore platforms using OpenMP, while the other uses the Compute Unified Device Architecture (CUDA) to parallelize LDPC decoding on Graphics Processing Units (GPUs). Empirical studies on data with various scales show that the performance of these decoding processes is improved by these parallel strategies and the GPUs provide more efficient, fast implementation decoder.
基金Project supported by the National Natural Science Foundation of China(Grant No.51279028,51479175)the Public Science and Technology Research Funds Projects of Ocean(Grant No.201405025)
文摘With the unstructured grid, the Finite Volume Coastal Ocean Model(FVCOM) is converted from its original FORTRAN code to a Compute Unified Device Architecture(CUDA) C code, and optimized on the Graphic Processor Unit(GPU). The proposed GPU-FVCOM is tested against analytical solutions for two standard cases in a rectangular basin, a tide induced flow and a wind induced circulation. It is then applied to the Ningbo's coastal water area to simulate the tidal motion and analyze the flow field and the vertical tide velocity structure. The simulation results agree with the measured data quite well. The accelerated performance of the proposed 3-D model reaches 30 times of that of a single thread program, and the GPU-FVCOM implemented on a Tesla k20 device is faster than on a workstation with 20 CPU cores, which shows that the GPU-FVCOM is efficient for solving large scale sea area and high resolution engineering problems.