期刊文献+
共找到7篇文章
< 1 >
每页显示 20 50 100
Parallel Implementation of the Non-Overlapping Template Matching Test Using CUDA
1
作者 Kaikai Li Jianguo Zhang +2 位作者 Pu Li Anbang Wang Yuncai Wang 《China Communications》 SCIE CSCD 2020年第8期234-241,共8页
NIST(National Institute of Standards and Technology) statistical test recognized as the most authoritative is widely used in verifying the randomness of binary sequences. The Non-overlapping Template Matching Test as ... NIST(National Institute of Standards and Technology) statistical test recognized as the most authoritative is widely used in verifying the randomness of binary sequences. The Non-overlapping Template Matching Test as the 7 th test of the NIST Test Suit is remarkably time consuming and the slow performance is one of the major hurdles in the testing process. In this paper, we present an efficient bit-parallel matching algorithm and segmented scan-based strategy for execution on Graphics Processing Unit(GPU) using NVIDIA Compute Unified Device Architecture(CUDA). Experimental results show the significant performance improvement of the parallelized Non-overlapping Template Matching Test, the running speed is 483 times faster than the original NIST implementation without attenuating the test result accuracy. 展开更多
关键词 random numbers CUDA non-overlapping template matching test parallel implementation NIST test
下载PDF
Parallel Implementation of Linear Algebra Problems on Dawning-1000
2
作者 迟学斌 par25t.ict.ac.cn 《Journal of Computer Science & Technology》 SCIE EI CSCD 1998年第2期141-146,共6页
In this paper, some parallel algorithms are described for solving numerical linear algebra problems on Dawning-1000. They include matrix multiplication, LU factorization of a dense matrix, Cholesky factorization of a ... In this paper, some parallel algorithms are described for solving numerical linear algebra problems on Dawning-1000. They include matrix multiplication, LU factorization of a dense matrix, Cholesky factorization of a symmetric matrix, and eigendecomposition of symmetric matrix for real and complex data types. These programs are constructed based on fast BLAS library of Dawning-1000 under NX environment.Some comparison results under different parallel environments and implementing methods are also given for Cholesky factorization. The execution time, measured performance and speedup for each problem on Dawning-1000 are shown. For matrix multiplication and LU factorization, 1.86GFLOPS and 1.53GFLOPS are reached. 展开更多
关键词 parallel algorithm parallel environment numerical linear algebra parallel implementation Dawning-1000
原文传递
Linear scaling Coulomb interaction in the multiwavelet basis,a parallel implementation
3
作者 Stig Rune Jensen Jonas Jusélius +3 位作者 Antoine Durdek Tor Fl˚a Peter Wind Luca Frediani 《International Journal of Modeling, Simulation, and Scientific Computing》 EI 2014年第S01期28-50,共23页
We present a parallel and linear scaling implementation of the calculation of the electrostatic potential arising from an arbitrary charge distribution.Our approach is making use of the multi-resolution basis of multi... We present a parallel and linear scaling implementation of the calculation of the electrostatic potential arising from an arbitrary charge distribution.Our approach is making use of the multi-resolution basis of multiwavelets.The potential is obtained as the direct solution of the Poisson equation in its Green’s function integral form.In the multiwavelet basis,the formally non local integral operator decays rapidly to negligible values away from the main diagonal,yielding an effectively banded structure where the bandwidth is only dictated by the requested accuracy.This sparse operator structure has been exploited to achieve linear scaling and parallel algorithms.Parallelization has been achieved both through the shared memory(OpenMP)and the message passing interface(MPI)paradigm.Our implementation has been tested by computing the electrostatic potential of the electronic density of long-chain alkanes and diamond fragments showing(sub)linear scaling with the system size and efficent parallelization. 展开更多
关键词 MULTIWAVELETS electrostatic potentials Poisson equation integral operators linear scaling parallel implementation
原文传递
Parallel Spectral Clustering Based on MapReduce 被引量:3
4
作者 Qiwei Zhong Yunlong Lin +3 位作者 Junyang Zou Kuangyan Zhu Qiao Wang Lei Hu 《ZTE Communications》 2013年第2期45-50,共6页
Clustering is one of the most widely used techniques for exploratory data analysis. Spectral clustering algorithm, a popular modern cluslering algorithm, has been shown to be more effective in detecting clusters than ... Clustering is one of the most widely used techniques for exploratory data analysis. Spectral clustering algorithm, a popular modern cluslering algorithm, has been shown to be more effective in detecting clusters than many traditional algorithms. It has applications ranging from computer vision and information retrieval to social sienee and biology. With the size of databases soaring, cluostering algorithms bare saling computational time and memory use. In this paper, we propose a parallel spectral elustering implementation based on MapRednee. Both the computation and data storage are dislributed, which solves the sealability problems for most existing algorithms. We empirically analyze the proposed implementation on both benchmark net- works and a real social network dataset of about two million vertices and two billion edges crawled from Sina Weibo. It is shown that the proposed implementation scales well, speeds up the clustering without sacrificing quality, and processes massive datasets efficiently on commodity machine clusters. 展开更多
关键词 spectral clustering parallel implementation massive dataset Hadoop MapRedue data mining
下载PDF
Improved Software Implementation for Montgomery Elliptic Curve Cryptosystem
5
作者 Mohammad Al-Khatib Wafaa Saif 《Computers, Materials & Continua》 SCIE EI 2022年第3期4847-4865,共19页
The last decade witnessed rapid increase in multimedia and other applications that require transmitting and protecting huge amount of data streams simultaneously.For such applications,a high-performance cryptosystem i... The last decade witnessed rapid increase in multimedia and other applications that require transmitting and protecting huge amount of data streams simultaneously.For such applications,a high-performance cryptosystem is compulsory to provide necessary security services.Elliptic curve cryptosystem(ECC)has been introduced as a considerable option.However,the usual sequential implementation of ECC and the standard elliptic curve(EC)form cannot achieve required performance level.Moreover,the widely used Hardware implementation of ECC is costly option and may be not affordable.This research aims to develop a high-performance parallel software implementation for ECC.To achieve this,many experiments were performed to examine several factors affecting ECC performance including the projective coordinates,the scalar multiplication algorithm,the elliptic curve(EC)form,and the parallel implementation.The ECC performance was analyzed using the different factors to tune-up them and select the best choices to increase the speed of the cryptosystem.Experimental results illustrated that parallel Montgomery ECC implementation using homogenous projection achieves the highest performance level,since it scored the shortest time delay for ECC computations.In addition,results showed thatNAF algorithm consumes less time to perform encryption and scalar multiplication operations in comparison withMontgomery ladder and binarymethods.Java multi-threading technique was adopted to implement ECC computations in parallel.The proposed multithreaded Montgomery ECC implementation significantly improves the performance level compared to previously presented parallel and sequential implementations. 展开更多
关键词 Elliptic curve cryptosystem parallel software implementation MULTI-THREADING scalar multiplication algorithms modular arithmetic
下载PDF
Hybrid URANS/LES Method of Flow Fields in Axial-flow Compressor Rotor Rotor
6
作者 Jia-hao Xiao Ya-ping Ju Chu-hua Zhang 《风机技术》 2023年第6期17-23,85,共8页
Accurate and efficient prediction of the aerodynamic performance and flow details of axial-flow com-pressors is of great engineering application value for the aerodynamic design and flow control of axial-flow compres-... Accurate and efficient prediction of the aerodynamic performance and flow details of axial-flow com-pressors is of great engineering application value for the aerodynamic design and flow control of axial-flow compres-sors.In this work,a delayed detached eddy simulation method is developed and applied to numerically simulate the tur-bulent channel flow and the aerodynamic performance of NASA Rotor 35.Several acceleration techniques including parallel implementation are also used to speed up the iteration convergence.The mean velocity distribution and Reyn-olds stress distribution in the boundary layer of turbulent channel flow and the aerodynamic performance curve of NASA Rotor 35 are predicted.The good agreement between the present delayed detached eddy simulation results and the available direct numerical simulation results or experimental data confirms the effectiveness of the developed meth-od in the accurate and efficient prediction of complex flow in turbomachinery. 展开更多
关键词 Delayed Detached Eddy Simulation Turbulent Channel Flow Axial-flow Compressor Rotor parallel implementation
下载PDF
A Parallel Computational Model for Three-Dimensional,Thermo-Mechanical Stokes Flow Simulations of Glaciers and Ice Sheets 被引量:1
7
作者 Wei Leng Lili Ju +1 位作者 Max Gunzburger Stephen Price 《Communications in Computational Physics》 SCIE 2014年第9期1056-1080,共25页
This paper focuses on the development of an efficient,three-dimensional,thermo-mechanical,nonlinear-Stokes flow computational model for ice sheet simulation.The model is based on the parallel finite element model deve... This paper focuses on the development of an efficient,three-dimensional,thermo-mechanical,nonlinear-Stokes flow computational model for ice sheet simulation.The model is based on the parallel finite element model developed in[14]which features high-order accurate finite element discretizations on variable resolution grids.Here,we add an improved iterative solution method for treating the nonlinearity of the Stokes problem,a new high-order accurate finite element solver for the temperature equation,and a new conservative finite volume solver for handling mass conservation.The result is an accurate and efficient numerical model for thermo-mechanical glacier and ice-sheet simulations.We demonstrate the improved efficiency of the Stokes solver using the ISMIP-HOM Benchmark experiments and a realistic test case for the Greenland ice-sheet.We also apply our model to the EISMINT-II benchmark experiments and demonstrate stable thermo-mechanical ice sheet evolution on both structured and unstructured meshes.Notably,we find no evidence for the“cold spoke”instabilities observed for these same experiments when using finite difference,shallow-ice approximation models on structured grids. 展开更多
关键词 Stokes-flow modeling ice-sheet modeling finite element approximation finite volume approximation parallel implementation.
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部