NIST(National Institute of Standards and Technology) statistical test recognized as the most authoritative is widely used in verifying the randomness of binary sequences. The Non-overlapping Template Matching Test as ...NIST(National Institute of Standards and Technology) statistical test recognized as the most authoritative is widely used in verifying the randomness of binary sequences. The Non-overlapping Template Matching Test as the 7 th test of the NIST Test Suit is remarkably time consuming and the slow performance is one of the major hurdles in the testing process. In this paper, we present an efficient bit-parallel matching algorithm and segmented scan-based strategy for execution on Graphics Processing Unit(GPU) using NVIDIA Compute Unified Device Architecture(CUDA). Experimental results show the significant performance improvement of the parallelized Non-overlapping Template Matching Test, the running speed is 483 times faster than the original NIST implementation without attenuating the test result accuracy.展开更多
. This paper conducts the analysis on the data mining algorithm implementation and its application in parallel cloud system based on C++. With the increase in the number of the cloud computing platform developers, w.... This paper conducts the analysis on the data mining algorithm implementation and its application in parallel cloud system based on C++. With the increase in the number of the cloud computing platform developers, with the use of cloud computing platform to support the growth of the number of Internet users, the system is also the proportion of log data growth. At present applies in the colony environment many is the news transmission model. In takes in the rest transmission model, between each concurrent execution part exchanges the information, and the coordinated step and the control execution through the transmission news. As for the C++ in the data mining applications, it should ? rstly hold the following features. Parallel communication and serial communication are two basic ways of general communication. Under this basis, this paper proposes the novel perspective on the data mining algorithm implementation and its application in parallel cloud system based on C++. The later research will be focused on the code based implementation.展开更多
The last decade witnessed rapid increase in multimedia and other applications that require transmitting and protecting huge amount of data streams simultaneously.For such applications,a high-performance cryptosystem i...The last decade witnessed rapid increase in multimedia and other applications that require transmitting and protecting huge amount of data streams simultaneously.For such applications,a high-performance cryptosystem is compulsory to provide necessary security services.Elliptic curve cryptosystem(ECC)has been introduced as a considerable option.However,the usual sequential implementation of ECC and the standard elliptic curve(EC)form cannot achieve required performance level.Moreover,the widely used Hardware implementation of ECC is costly option and may be not affordable.This research aims to develop a high-performance parallel software implementation for ECC.To achieve this,many experiments were performed to examine several factors affecting ECC performance including the projective coordinates,the scalar multiplication algorithm,the elliptic curve(EC)form,and the parallel implementation.The ECC performance was analyzed using the different factors to tune-up them and select the best choices to increase the speed of the cryptosystem.Experimental results illustrated that parallel Montgomery ECC implementation using homogenous projection achieves the highest performance level,since it scored the shortest time delay for ECC computations.In addition,results showed thatNAF algorithm consumes less time to perform encryption and scalar multiplication operations in comparison withMontgomery ladder and binarymethods.Java multi-threading technique was adopted to implement ECC computations in parallel.The proposed multithreaded Montgomery ECC implementation significantly improves the performance level compared to previously presented parallel and sequential implementations.展开更多
One of the many issues in utilizing ERP systems in organizations are, in fact, the implementation stage. This study by investigating the common and available methods of implementation, as well as their inefficiencies ...One of the many issues in utilizing ERP systems in organizations are, in fact, the implementation stage. This study by investigating the common and available methods of implementation, as well as their inefficiencies will provide a new more efficient method. This new method, in fact, will first assess the required time of implementation in each of the units of the organization, and then will take advantage of a spherical model with central core instead of a linear model. These units, with regard to the required time of implementation, will surround this core as in the form of some layers. The circuits are ordered in a way that the further we move from the core towards the external layers, the shorter the required time of implementation will become. This way, the priority of implementing ERP will be assigned with a direction from external layers to internal layers. Eventually, all the experiences of the previous stages will be transferred to the central core, which has the most complexity. Through this method, it is expected that we may prevent the fully parallel issue, which was a dominant and apparent issue in previous models, so that the required time of implementation would decrease.展开更多
Clustering is one of the most widely used techniques for exploratory data analysis. Spectral clustering algorithm, a popular modern cluslering algorithm, has been shown to be more effective in detecting clusters than ...Clustering is one of the most widely used techniques for exploratory data analysis. Spectral clustering algorithm, a popular modern cluslering algorithm, has been shown to be more effective in detecting clusters than many traditional algorithms. It has applications ranging from computer vision and information retrieval to social sienee and biology. With the size of databases soaring, cluostering algorithms bare saling computational time and memory use. In this paper, we propose a parallel spectral elustering implementation based on MapRednee. Both the computation and data storage are dislributed, which solves the sealability problems for most existing algorithms. We empirically analyze the proposed implementation on both benchmark net- works and a real social network dataset of about two million vertices and two billion edges crawled from Sina Weibo. It is shown that the proposed implementation scales well, speeds up the clustering without sacrificing quality, and processes massive datasets efficiently on commodity machine clusters.展开更多
In this paper, some parallel algorithms are described for solving numerical linear algebra problems on Dawning-1000. They include matrix multiplication, LU factorization of a dense matrix, Cholesky factorization of a ...In this paper, some parallel algorithms are described for solving numerical linear algebra problems on Dawning-1000. They include matrix multiplication, LU factorization of a dense matrix, Cholesky factorization of a symmetric matrix, and eigendecomposition of symmetric matrix for real and complex data types. These programs are constructed based on fast BLAS library of Dawning-1000 under NX environment.Some comparison results under different parallel environments and implementing methods are also given for Cholesky factorization. The execution time, measured performance and speedup for each problem on Dawning-1000 are shown. For matrix multiplication and LU factorization, 1.86GFLOPS and 1.53GFLOPS are reached.展开更多
We present a parallel and linear scaling implementation of the calculation of the electrostatic potential arising from an arbitrary charge distribution.Our approach is making use of the multi-resolution basis of multi...We present a parallel and linear scaling implementation of the calculation of the electrostatic potential arising from an arbitrary charge distribution.Our approach is making use of the multi-resolution basis of multiwavelets.The potential is obtained as the direct solution of the Poisson equation in its Green’s function integral form.In the multiwavelet basis,the formally non local integral operator decays rapidly to negligible values away from the main diagonal,yielding an effectively banded structure where the bandwidth is only dictated by the requested accuracy.This sparse operator structure has been exploited to achieve linear scaling and parallel algorithms.Parallelization has been achieved both through the shared memory(OpenMP)and the message passing interface(MPI)paradigm.Our implementation has been tested by computing the electrostatic potential of the electronic density of long-chain alkanes and diamond fragments showing(sub)linear scaling with the system size and efficent parallelization.展开更多
基金supported in part by Shanxi Scholarship Council of China(Grant No.2017-key-2)the Natural Science Foundation of Shanxi Province(Grant No.201801D121145)+1 种基金the Natural Science Foundation of China(NSFC)(Grant No.61731014,61705157,61927811)the Program for Guangdong Introducing Innovative and Entrepreneurial Teams。
文摘NIST(National Institute of Standards and Technology) statistical test recognized as the most authoritative is widely used in verifying the randomness of binary sequences. The Non-overlapping Template Matching Test as the 7 th test of the NIST Test Suit is remarkably time consuming and the slow performance is one of the major hurdles in the testing process. In this paper, we present an efficient bit-parallel matching algorithm and segmented scan-based strategy for execution on Graphics Processing Unit(GPU) using NVIDIA Compute Unified Device Architecture(CUDA). Experimental results show the significant performance improvement of the parallelized Non-overlapping Template Matching Test, the running speed is 483 times faster than the original NIST implementation without attenuating the test result accuracy.
文摘. This paper conducts the analysis on the data mining algorithm implementation and its application in parallel cloud system based on C++. With the increase in the number of the cloud computing platform developers, with the use of cloud computing platform to support the growth of the number of Internet users, the system is also the proportion of log data growth. At present applies in the colony environment many is the news transmission model. In takes in the rest transmission model, between each concurrent execution part exchanges the information, and the coordinated step and the control execution through the transmission news. As for the C++ in the data mining applications, it should ? rstly hold the following features. Parallel communication and serial communication are two basic ways of general communication. Under this basis, this paper proposes the novel perspective on the data mining algorithm implementation and its application in parallel cloud system based on C++. The later research will be focused on the code based implementation.
基金Authors extend their appreciation to the Deanship of Scientific Research at Imam Mohammad Ibn Saud Islamic University for funding and supporting this work through Graduate Student Research Support Program.
文摘The last decade witnessed rapid increase in multimedia and other applications that require transmitting and protecting huge amount of data streams simultaneously.For such applications,a high-performance cryptosystem is compulsory to provide necessary security services.Elliptic curve cryptosystem(ECC)has been introduced as a considerable option.However,the usual sequential implementation of ECC and the standard elliptic curve(EC)form cannot achieve required performance level.Moreover,the widely used Hardware implementation of ECC is costly option and may be not affordable.This research aims to develop a high-performance parallel software implementation for ECC.To achieve this,many experiments were performed to examine several factors affecting ECC performance including the projective coordinates,the scalar multiplication algorithm,the elliptic curve(EC)form,and the parallel implementation.The ECC performance was analyzed using the different factors to tune-up them and select the best choices to increase the speed of the cryptosystem.Experimental results illustrated that parallel Montgomery ECC implementation using homogenous projection achieves the highest performance level,since it scored the shortest time delay for ECC computations.In addition,results showed thatNAF algorithm consumes less time to perform encryption and scalar multiplication operations in comparison withMontgomery ladder and binarymethods.Java multi-threading technique was adopted to implement ECC computations in parallel.The proposed multithreaded Montgomery ECC implementation significantly improves the performance level compared to previously presented parallel and sequential implementations.
文摘One of the many issues in utilizing ERP systems in organizations are, in fact, the implementation stage. This study by investigating the common and available methods of implementation, as well as their inefficiencies will provide a new more efficient method. This new method, in fact, will first assess the required time of implementation in each of the units of the organization, and then will take advantage of a spherical model with central core instead of a linear model. These units, with regard to the required time of implementation, will surround this core as in the form of some layers. The circuits are ordered in a way that the further we move from the core towards the external layers, the shorter the required time of implementation will become. This way, the priority of implementing ERP will be assigned with a direction from external layers to internal layers. Eventually, all the experiences of the previous stages will be transferred to the central core, which has the most complexity. Through this method, it is expected that we may prevent the fully parallel issue, which was a dominant and apparent issue in previous models, so that the required time of implementation would decrease.
文摘Clustering is one of the most widely used techniques for exploratory data analysis. Spectral clustering algorithm, a popular modern cluslering algorithm, has been shown to be more effective in detecting clusters than many traditional algorithms. It has applications ranging from computer vision and information retrieval to social sienee and biology. With the size of databases soaring, cluostering algorithms bare saling computational time and memory use. In this paper, we propose a parallel spectral elustering implementation based on MapRednee. Both the computation and data storage are dislributed, which solves the sealability problems for most existing algorithms. We empirically analyze the proposed implementation on both benchmark net- works and a real social network dataset of about two million vertices and two billion edges crawled from Sina Weibo. It is shown that the proposed implementation scales well, speeds up the clustering without sacrificing quality, and processes massive datasets efficiently on commodity machine clusters.
文摘In this paper, some parallel algorithms are described for solving numerical linear algebra problems on Dawning-1000. They include matrix multiplication, LU factorization of a dense matrix, Cholesky factorization of a symmetric matrix, and eigendecomposition of symmetric matrix for real and complex data types. These programs are constructed based on fast BLAS library of Dawning-1000 under NX environment.Some comparison results under different parallel environments and implementing methods are also given for Cholesky factorization. The execution time, measured performance and speedup for each problem on Dawning-1000 are shown. For matrix multiplication and LU factorization, 1.86GFLOPS and 1.53GFLOPS are reached.
基金supported by the Research Council of Norway through a Cen-tre of Excellence Grant(Grant No.179568/V30)from the Norwegian Super-computing Program(NOTUR)through a grant of computer time(Grant No.NN4654K).
文摘We present a parallel and linear scaling implementation of the calculation of the electrostatic potential arising from an arbitrary charge distribution.Our approach is making use of the multi-resolution basis of multiwavelets.The potential is obtained as the direct solution of the Poisson equation in its Green’s function integral form.In the multiwavelet basis,the formally non local integral operator decays rapidly to negligible values away from the main diagonal,yielding an effectively banded structure where the bandwidth is only dictated by the requested accuracy.This sparse operator structure has been exploited to achieve linear scaling and parallel algorithms.Parallelization has been achieved both through the shared memory(OpenMP)and the message passing interface(MPI)paradigm.Our implementation has been tested by computing the electrostatic potential of the electronic density of long-chain alkanes and diamond fragments showing(sub)linear scaling with the system size and efficent parallelization.
基金Supported by the National Research Foundation for the Doctoral Program of Ministry of Education of China(国家教育部博士点基金)the Natural Science Foundation of Jiangsu Province of China under GrantNo.BK2003030(江苏省自然科学基金)