Further improving the railway innovation capacity and technological strength is the important goal of the 14th Five-Year Plan for railway scientific and technological innovation.It includes promoting the deep integrat...Further improving the railway innovation capacity and technological strength is the important goal of the 14th Five-Year Plan for railway scientific and technological innovation.It includes promoting the deep integration of cutting-edge technologies with the railway systems,strengthening the research and application of intelligent railway technologies,applying green computing technologies and advancing the collaborative sharing of transportation big data.The high-speed rail system tasks need to process huge amounts of data and heavy workload with the requirement of ultra-fast response.Therefore,it is of great necessity to promote computation efficiency by applying High Performance Computing(HPC)to high-speed rail systems.The HPC technique is a great solution for improving the performance,efficiency,and safety of high-speed rail systems.In this review,we introduce and analyze the application research of high performance computing technology in the field of highspeed railways.These HPC applications are cataloged into four broad categories,namely:fault diagnosis,network and communication,management system,and simulations.Moreover,challenges and issues to be addressed are discussed and further directions are suggested.展开更多
A new direct method for solving unsymmetrical sparse linear systems(USLS) arising from meshless methods was introduced. Computation of certain meshless methods such as meshless local Petrov-Galerkin (MLPG) method ...A new direct method for solving unsymmetrical sparse linear systems(USLS) arising from meshless methods was introduced. Computation of certain meshless methods such as meshless local Petrov-Galerkin (MLPG) method need to solve large USLS. The proposed solution method for unsymmetrical case performs factorization processes symmetrically on the upper and lower triangular portion of matrix, which differs from previous work based on general unsymmetrical process, and attains higher performance. It is shown that the solution algorithm for USLS can be simply derived from the existing approaches for the symmetrical case. The new matrix factorization algorithm in our method can be implemented easily by modifying a standard JKI symmetrical matrix factorization code. Multi-blocked out-of-core strategies were also developed to expand the solution scale. The approach convincingly increases the speed of the solution process, which is demonstrated with the numerical tests.展开更多
In the previous papers, a high performance sparse static solver with two-level unrolling based on a cell-sparse storage scheme was reported. Although the solver reaches quite a high efficiency for a big percentage of ...In the previous papers, a high performance sparse static solver with two-level unrolling based on a cell-sparse storage scheme was reported. Although the solver reaches quite a high efficiency for a big percentage of finite element analysis benchmark tests, the MFLOPS (million floating operations per second) of LDL^T factorization of benchmark tests vary on a Dell Pentium IV 850 MHz machine from 100 to 456 depending on the average size of the super-equations, i.e., on the average depth of unrolling. In this paper, a new sparse static solver with two-level unrolling that employs the concept of master-equations and searches for an appropriate depths of unrolling is proposed. The new solver provides higher MFLOPS for LDL^T factorization of benchmark tests, and therefore speeds up the solution process.展开更多
In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularl...In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularly noteworthy in the field of image processing, which witnessed significant advancements. This parallel computing project explored the field of parallel image processing, with a focus on the grayscale conversion of colorful images. Our approach involved integrating OpenMP into our framework for parallelization to execute a critical image processing task: grayscale conversion. By using OpenMP, we strategically enhanced the overall performance of the conversion process by distributing the workload across multiple threads. The primary objectives of our project revolved around optimizing computation time and improving overall efficiency, particularly in the task of grayscale conversion of colorful images. Utilizing OpenMP for concurrent processing across multiple cores significantly reduced execution times through the effective distribution of tasks among these cores. The speedup values for various image sizes highlighted the efficacy of parallel processing, especially for large images. However, a detailed examination revealed a potential decline in parallelization efficiency with an increasing number of cores. This underscored the importance of a carefully optimized parallelization strategy, considering factors like load balancing and minimizing communication overhead. Despite challenges, the overall scalability and efficiency achieved with parallel image processing underscored OpenMP’s effectiveness in accelerating image manipulation tasks.展开更多
The high-performance computing paradigm needs high-speed switching fabrics to meet the heavy traffic generated by their applications.These switching fabrics are efficiently driven by the deployed scheduling algorithms...The high-performance computing paradigm needs high-speed switching fabrics to meet the heavy traffic generated by their applications.These switching fabrics are efficiently driven by the deployed scheduling algorithms.In this paper,we proposed two scheduling algorithms for input queued switches whose operations are based on ranking procedures.At first,we proposed a Simple 2-Bit(S2B)scheme which uses binary ranking procedure and queue size for scheduling the packets.Here,the Virtual Output Queue(VOQ)set with maximum number of empty queues receives higher rank than other VOQ’s.Through simulation,we showed S2B has better throughput performance than Highest Ranking First(HRF)arbitration under uniform,and non-uniform traffic patterns.To further improve the throughput-delay performance,an Enhanced 2-Bit(E2B)approach is proposed.This approach adopts an integer representation for rank,which is the number of empty queues in a VOQ set.The simulation result shows E2B outperforms S2B and HRF scheduling algorithms with maximum throughput-delay performance.Furthermore,the algorithms are simulated under hotspot traffic and E2B proves to be more efficient.展开更多
This paper proposes algorithm for Increasing Virtual Machine Security Strategy in Cloud Computing computations.Imbalance between load and energy has been one of the disadvantages of old methods in providing server and...This paper proposes algorithm for Increasing Virtual Machine Security Strategy in Cloud Computing computations.Imbalance between load and energy has been one of the disadvantages of old methods in providing server and hosting,so that if two virtual severs be active on a host and energy load be more on a host,it would allocated the energy of other hosts(virtual host)to itself to stay steady and this option usually leads to hardware overflow errors and users dissatisfaction.This problem has been removed in methods based on cloud processing but not perfectly,therefore,providing an algorithm not only will implement a suitable security background but also it will suitably divide energy consumption and load balancing among virtual severs.The proposed algorithm is compared with several previously proposed Security Strategy including SC-PSSF,PSSF and DEEAC.Comparisons show that the proposed method offers high performance computing,efficiency and consumes lower energy in the network.展开更多
Today the PC class machines are quite popular for HPC area, especially on the problemsthat require the good cost/performance ratios. One of the drawback of these machines is the poormemory throughput performance. And ...Today the PC class machines are quite popular for HPC area, especially on the problemsthat require the good cost/performance ratios. One of the drawback of these machines is the poormemory throughput performance. And one of the reasons of the poor performance is depend on the lack of the mapping capability of the TLB which is a buffer to accelerate the virtual memory access. In this report, I present that the mapping capability and the performance can be improved with the multi granularity TLB feature that some processors have. And I also present that the new TLB handling routine can be incorporated into the demand paging system of Linux.展开更多
The integration of clusters,grids,clouds,edges and other computing platforms result in contemporary technology of jungle computing.This novel technique has the aptitude to tackle high performance computation systems a...The integration of clusters,grids,clouds,edges and other computing platforms result in contemporary technology of jungle computing.This novel technique has the aptitude to tackle high performance computation systems and it manages the usage of all computing platforms at a time.Federated learning is a collaborative machine learning approach without centralized training data.The proposed system effectively detects the intrusion attack without human intervention and subsequently detects anomalous deviations in device communication behavior,potentially caused by malicious adversaries and it can emerge with new and unknown attacks.The main objective is to learn overall behavior of an intruder while performing attacks to the assumed target service.Moreover,the updated system model is send to the centralized server in jungle computing,to detect their pattern.Federated learning greatly helps the machine to study the type of attack from each device and this technique paves a way to complete dominion over all malicious behaviors.In our proposed work,we have implemented an intrusion detection system that has high accuracy,low False Positive Rate(FPR)scalable,and versatile for the jungle computing environment.The execution time taken to complete a round is less than two seconds,with an accuracy rate of 96%.展开更多
In this paper, a brief survey of smart citiy projects in Europe is presented. This survey shows the extent of transport and logistics in smart cities. We concentrate on a smart city project we have been working on tha...In this paper, a brief survey of smart citiy projects in Europe is presented. This survey shows the extent of transport and logistics in smart cities. We concentrate on a smart city project we have been working on that is related to A Logistic Mobile Application (ALMA). The application is based on Internet of Things and combines a communication infrastructure and a High Performance Computing infrastructure in order to deliver mobile logistic services with high quality of service and adaptation to the dynamic nature of logistic operations.展开更多
A multifrontal code is introduced for the efficient solution of the linear system of equations arising from the analysis of structures. The factorization phase is reduced into a series of interleaved element assembly ...A multifrontal code is introduced for the efficient solution of the linear system of equations arising from the analysis of structures. The factorization phase is reduced into a series of interleaved element assembly and dense matrix operations for which the BLAS3 kernels are used. A similar approach is generalized for the forward and back substitution phases for the efficient solution of structures having multiple load conditions. The program performs all assembly and solution steps in parallel. Examples are presented which demonstrate the code’s performance on single and dual core processor computers.展开更多
Traditional high performance computing(HPC)systems provide a standard preset environment to support scientific computation.However,HPC development needs to provide support for more and more diverse applications,such a...Traditional high performance computing(HPC)systems provide a standard preset environment to support scientific computation.However,HPC development needs to provide support for more and more diverse applications,such as artificial intelligence and big data.The standard preset environment can no longer meet these diverse requirements.If users still run these emerging applications on HPC systems,they need to manually maintain the specific dependencies(libraries,environment variables,and so on)of their applications.This increases the development and deployment burden for users.Moreover,the multi-user mode brings about privacy problems among users.Containers like Docker and Singularity can encapsulate the job’s execution environment,but in a highly customized HPC system,cross-environment application deployment of Docker and Singularity is limited.The introduction of container images also imposes a maintenance burden on system administrators.Facing the above-mentioned problems,in this paper we propose a self-deployed execution environment(SDEE)for HPC.SDEE combines the advantages of traditional virtualization and modern containers.SDEE provides an isolated and customizable environment(similar to a virtual machine)to the user.The user is the root user in this environment.The user develops and debugs the application and deploys its special dependencies in this environment.Then the user can load the job to compute nodes directly through the traditional HPC job management system.The job and its dependencies are analyzed,packaged,deployed,and executed automatically.This process enables transparent and rapid job deployment,which not only reduces the burden on users,but also protects user privacy.Experiments show that the overhead introduced by SDEE is negligible and lower than those of both Docker and Singularity.展开更多
The aim of this paper is to test a developed SOR R&B method using the Chebyshev accelerator algorithm to solve the Laplace equation in a cubic 3D configuration. Comparisons are made in terms of precision and computin...The aim of this paper is to test a developed SOR R&B method using the Chebyshev accelerator algorithm to solve the Laplace equation in a cubic 3D configuration. Comparisons are made in terms of precision and computing time with other elliptic equation solvers proposed in the open source LIS library. The first results, obtained by using a single core on a HPC, show that the developed SOR R&B method is efficient when the spectral radius needed for the Chebyshev acceleration is carefully pre-estimated. Preliminary results obtained with a parallelized code using the MPI library are also discussed when the calculation is distributed over one hundred cores.展开更多
A new spectral three-term conjugate gradient algorithm in virtue of the Quasi-Newton equation is developed for solving large-scale unconstrained optimization problems.It is proved that the search directions in this al...A new spectral three-term conjugate gradient algorithm in virtue of the Quasi-Newton equation is developed for solving large-scale unconstrained optimization problems.It is proved that the search directions in this algorithm always satisfy a sufficiently descent condition independent of any line search.Global convergence is established for general objective functions if the strong Wolfe line search is used.Numerical experiments are employed to show its high numerical performance in solving large-scale optimization problems.Particularly,the developed algorithm is implemented to solve the 100 benchmark test problems from CUTE with different sizes from 1000 to 10,000,in comparison with some similar ones in the literature.The numerical results demonstrate that our algorithm outperforms the state-of-the-art ones in terms of less CPU time,less number of iteration or less number of function evaluation.展开更多
Convolutional Neural Networks(CNNs)are widely used in many fields.Due to their high throughput and high level of computing characteristics,however,an increasing number of researchers are focusing on how to improve the...Convolutional Neural Networks(CNNs)are widely used in many fields.Due to their high throughput and high level of computing characteristics,however,an increasing number of researchers are focusing on how to improve the computational efficiency,hardware utilization,or flexibility of CNN hardware accelerators.Accordingly,this paper proposes a dynamically reconfigurable accelerator architecture that implements a Sparse-Winograd F(2×2.3×3)-based high-parallelism hardware architecture.This approach not only eliminates the pre-calculation complexity associated with the Winograd algorithm,thereby reducing the difficulty of hardware implementation,but also greatly improves the flexibility of the hardware;as a result,the accelerator can realize the calculation of Conventional Convolution,Grouped Convolution(GCONV)or Depthwise Separable Convolution(DSC)using the same hardware architecture.Our experimental results show that the accelerator achieves a 3x–4.14x speedup compared with the designs that do not use the acceleration algorithm on VGG-16 and MobileNet V1.Moreover,compared with previous designs using the traditional Winograd algorithm,the accelerator design achieves 1.4x–1.8x speedup.At the same time,the efficiency of the multiplier improves by up to 142%.展开更多
Objective:As a high computation cost discipline,nuclear science and engineering still relies heavily on traditional high performance computing(HPC)clusters.However,the usage of traditional HPC for nuclear science and ...Objective:As a high computation cost discipline,nuclear science and engineering still relies heavily on traditional high performance computing(HPC)clusters.However,the usage of traditional HPC for nuclear science and engineering has been limited due to the poor flexibility,the software compatibility and the poor user interfaces.Virtualized/virtual HPC(vHPC)can mimic an HPC by using a cloud computing platform.In this work,we designed and developed a vHPC system for employment in nuclear engineering.Methods:The system is tested using the computation of the numberπby Monte Carlo and an X-ray digital imaging system simulation.The performance of the vHPC system is compared with that of the traditional HPCs.Results:As the number of the simulated particles increases,the virtual cluster computing time grows propor-tionally.The time used for the simulation of the X-ray imaging was about 21.1 h over a 12 kernels virtual server.Experimental results show that the performance of virtual cluster computing and the actual physical machine is almost the same.Conclusions:From these tests,it is concluded that vHPC is a good alternative for employing in nuclear engineering.The proposed vHPC in this paper will make HPC flexible and easy to deploy.展开更多
Parallel vector buffer analysis approaches can be classified into 2 types:algorithm-oriented parallel strategy and the data-oriented parallel strategy.These methods do not take its applicability on the existing geogra...Parallel vector buffer analysis approaches can be classified into 2 types:algorithm-oriented parallel strategy and the data-oriented parallel strategy.These methods do not take its applicability on the existing geographic information systems(GIS)platforms into consideration.In order to address the problem,a spatial decomposition approach for accelerating buffer analysis of vector data is proposed.The relationship between the number of vertices of each feature and the buffer analysis computing time is analyzed to generate computational intensity transformation functions(CITFs).Then,computational intensity grids(CIGs)of polyline and polygon are constructed based on the relative CITFs.Using the corresponding CIGs,a spatial decomposition method for parallel buffer analysis is developed.Based on the computational intensity of the features and the sub-domains generated in the decomposition,the features are averagely assigned within the sub-domains into parallel buffer analysis tasks for load balance.Compared with typical regular domain decomposition methods,the new approach accomplishes greater balanced decomposition of computational intensity for parallel buffer analysis and achieves near-linear speedups.展开更多
We detail some of the understudied aspects of the flow inside and around the Hexactinellid Sponge Euplectella aspergillum.By leveraging the flexibility of the Lattice Boltzmann Method,High Performance Computing simula...We detail some of the understudied aspects of the flow inside and around the Hexactinellid Sponge Euplectella aspergillum.By leveraging the flexibility of the Lattice Boltzmann Method,High Performance Computing simulations are performed to dissect the complex conditions corresponding to the actual environment at the bottom of the ocean,at depths between 100 and 1,000 m.These large-scale simulations unveil potential clues on the evolutionary adaptations of these deep-sea sponges in response to the surrounding fluid flow,and they open the path to future investigations at the interface between physics,engineering and biology.展开更多
Every year, transmission congestion costs billions ofdollars for electricity customers. This clearly identifies the criticalneed for more transmission capacity and also poses big challengesfor power grid reliability i...Every year, transmission congestion costs billions ofdollars for electricity customers. This clearly identifies the criticalneed for more transmission capacity and also poses big challengesfor power grid reliability in stressed conditions due to heavyloading and in uncertain situations due to variable renewableresources and responsive smart loads. However, it becomesincreasingly difficult to build new transmission lines, whichtypically involve both economic and environmental constraints.In this paper, advanced computing techniques are developedto enable a non-wire solution that realizes unused transfercapabilities of existing transmission facilities. An integratedsoftware prototype powered by high-performance computing(HPC) is developed to calculate ratings of key transmission pathsin real time for relieving transmission congestion and facilitatingrenewable integration, while complying with the North AmericanElectric Reliability Corporation (NERC) standards on assessingtotal transfer capabilities. The innovative algorithms include: (1)massive contingency analysis enabled by dynamic load balancing,(2) parallel transient simulation to speed up single dynamicsimulation, (3) a non-iterative method for calculating voltagesecurity boundary and (4) an integrated package consideringall NERC required limits. This tool has been tested on realisticpower system models in the Western Interconnection of NorthAmerica and demonstrates satisfactory computational speedusing parallel computers. Various benefits of real-time path ratingare investigated at Bonneville Power Administration using realtime EMS snapshots, demonstrating a significant increase in pathlimits. These technologies would change the traditional goals ofpath rating studies, fundamentally transforming how the grid isoperated, and maximizing the utilization of national transmissionassets, as well as facilitating integration of renewable energy andsmart loads.展开更多
In this paper we present a 2D/3D high order accurate finite volume scheme in the context of direct Arbitrary-Lagrangian-Eulerian algorithms for general hyperbolic systems of partial differential equations with non-con...In this paper we present a 2D/3D high order accurate finite volume scheme in the context of direct Arbitrary-Lagrangian-Eulerian algorithms for general hyperbolic systems of partial differential equations with non-conservative products and stiff source terms.This scheme is constructed with a single stencil polynomial reconstruction operator,a one-step space-time ADER integration which is suitably designed for dealing even with stiff sources,a nodal solver with relaxation to determine the mesh motion,a path-conservative integration technique for the treatment of non-conservative products and an a posteriori stabilization procedure derived from the so-called Multidimensional Optimal Order Detection(MOOD)paradigm.In this work we consider the seven equation Baer-Nunziato model of compressible multi-phase flows as a representative model involving non-conservative products as well as relaxation source terms which are allowed to become stiff.The new scheme is validated against a set of test cases on 2D/3D unstructured moving meshes on parallel machines and the high order of accuracy achieved by the method is demonstrated by performing a numerical convergence study.Classical Riemann problems and explosion problems with exact solutions are simulated in 2D and 3D.The overall numerical code is also profiled to provide an estimate of the computational cost required by each component of the whole algorithm.展开更多
基金supported in part by the Talent Fund of Beijing Jiaotong University(2023XKRC017)in part by Research and Development Project of China State Railway Group Co.,Ltd.(P2022Z003).
文摘Further improving the railway innovation capacity and technological strength is the important goal of the 14th Five-Year Plan for railway scientific and technological innovation.It includes promoting the deep integration of cutting-edge technologies with the railway systems,strengthening the research and application of intelligent railway technologies,applying green computing technologies and advancing the collaborative sharing of transportation big data.The high-speed rail system tasks need to process huge amounts of data and heavy workload with the requirement of ultra-fast response.Therefore,it is of great necessity to promote computation efficiency by applying High Performance Computing(HPC)to high-speed rail systems.The HPC technique is a great solution for improving the performance,efficiency,and safety of high-speed rail systems.In this review,we introduce and analyze the application research of high performance computing technology in the field of highspeed railways.These HPC applications are cataloged into four broad categories,namely:fault diagnosis,network and communication,management system,and simulations.Moreover,challenges and issues to be addressed are discussed and further directions are suggested.
基金Project supported by the National Natural Science Foundation of China (Nos. 10232040, 10572002 and 10572003)
文摘A new direct method for solving unsymmetrical sparse linear systems(USLS) arising from meshless methods was introduced. Computation of certain meshless methods such as meshless local Petrov-Galerkin (MLPG) method need to solve large USLS. The proposed solution method for unsymmetrical case performs factorization processes symmetrically on the upper and lower triangular portion of matrix, which differs from previous work based on general unsymmetrical process, and attains higher performance. It is shown that the solution algorithm for USLS can be simply derived from the existing approaches for the symmetrical case. The new matrix factorization algorithm in our method can be implemented easily by modifying a standard JKI symmetrical matrix factorization code. Multi-blocked out-of-core strategies were also developed to expand the solution scale. The approach convincingly increases the speed of the solution process, which is demonstrated with the numerical tests.
基金Project supported by the Research Fund for the Doctoral Program of Higher Education (No.20030001112).
文摘In the previous papers, a high performance sparse static solver with two-level unrolling based on a cell-sparse storage scheme was reported. Although the solver reaches quite a high efficiency for a big percentage of finite element analysis benchmark tests, the MFLOPS (million floating operations per second) of LDL^T factorization of benchmark tests vary on a Dell Pentium IV 850 MHz machine from 100 to 456 depending on the average size of the super-equations, i.e., on the average depth of unrolling. In this paper, a new sparse static solver with two-level unrolling that employs the concept of master-equations and searches for an appropriate depths of unrolling is proposed. The new solver provides higher MFLOPS for LDL^T factorization of benchmark tests, and therefore speeds up the solution process.
文摘In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularly noteworthy in the field of image processing, which witnessed significant advancements. This parallel computing project explored the field of parallel image processing, with a focus on the grayscale conversion of colorful images. Our approach involved integrating OpenMP into our framework for parallelization to execute a critical image processing task: grayscale conversion. By using OpenMP, we strategically enhanced the overall performance of the conversion process by distributing the workload across multiple threads. The primary objectives of our project revolved around optimizing computation time and improving overall efficiency, particularly in the task of grayscale conversion of colorful images. Utilizing OpenMP for concurrent processing across multiple cores significantly reduced execution times through the effective distribution of tasks among these cores. The speedup values for various image sizes highlighted the efficacy of parallel processing, especially for large images. However, a detailed examination revealed a potential decline in parallelization efficiency with an increasing number of cores. This underscored the importance of a carefully optimized parallelization strategy, considering factors like load balancing and minimizing communication overhead. Despite challenges, the overall scalability and efficiency achieved with parallel image processing underscored OpenMP’s effectiveness in accelerating image manipulation tasks.
文摘The high-performance computing paradigm needs high-speed switching fabrics to meet the heavy traffic generated by their applications.These switching fabrics are efficiently driven by the deployed scheduling algorithms.In this paper,we proposed two scheduling algorithms for input queued switches whose operations are based on ranking procedures.At first,we proposed a Simple 2-Bit(S2B)scheme which uses binary ranking procedure and queue size for scheduling the packets.Here,the Virtual Output Queue(VOQ)set with maximum number of empty queues receives higher rank than other VOQ’s.Through simulation,we showed S2B has better throughput performance than Highest Ranking First(HRF)arbitration under uniform,and non-uniform traffic patterns.To further improve the throughput-delay performance,an Enhanced 2-Bit(E2B)approach is proposed.This approach adopts an integer representation for rank,which is the number of empty queues in a VOQ set.The simulation result shows E2B outperforms S2B and HRF scheduling algorithms with maximum throughput-delay performance.Furthermore,the algorithms are simulated under hotspot traffic and E2B proves to be more efficient.
文摘This paper proposes algorithm for Increasing Virtual Machine Security Strategy in Cloud Computing computations.Imbalance between load and energy has been one of the disadvantages of old methods in providing server and hosting,so that if two virtual severs be active on a host and energy load be more on a host,it would allocated the energy of other hosts(virtual host)to itself to stay steady and this option usually leads to hardware overflow errors and users dissatisfaction.This problem has been removed in methods based on cloud processing but not perfectly,therefore,providing an algorithm not only will implement a suitable security background but also it will suitably divide energy consumption and load balancing among virtual severs.The proposed algorithm is compared with several previously proposed Security Strategy including SC-PSSF,PSSF and DEEAC.Comparisons show that the proposed method offers high performance computing,efficiency and consumes lower energy in the network.
文摘Today the PC class machines are quite popular for HPC area, especially on the problemsthat require the good cost/performance ratios. One of the drawback of these machines is the poormemory throughput performance. And one of the reasons of the poor performance is depend on the lack of the mapping capability of the TLB which is a buffer to accelerate the virtual memory access. In this report, I present that the mapping capability and the performance can be improved with the multi granularity TLB feature that some processors have. And I also present that the new TLB handling routine can be incorporated into the demand paging system of Linux.
文摘The integration of clusters,grids,clouds,edges and other computing platforms result in contemporary technology of jungle computing.This novel technique has the aptitude to tackle high performance computation systems and it manages the usage of all computing platforms at a time.Federated learning is a collaborative machine learning approach without centralized training data.The proposed system effectively detects the intrusion attack without human intervention and subsequently detects anomalous deviations in device communication behavior,potentially caused by malicious adversaries and it can emerge with new and unknown attacks.The main objective is to learn overall behavior of an intruder while performing attacks to the assumed target service.Moreover,the updated system model is send to the centralized server in jungle computing,to detect their pattern.Federated learning greatly helps the machine to study the type of attack from each device and this technique paves a way to complete dominion over all malicious behaviors.In our proposed work,we have implemented an intrusion detection system that has high accuracy,low False Positive Rate(FPR)scalable,and versatile for the jungle computing environment.The execution time taken to complete a round is less than two seconds,with an accuracy rate of 96%.
文摘In this paper, a brief survey of smart citiy projects in Europe is presented. This survey shows the extent of transport and logistics in smart cities. We concentrate on a smart city project we have been working on that is related to A Logistic Mobile Application (ALMA). The application is based on Internet of Things and combines a communication infrastructure and a High Performance Computing infrastructure in order to deliver mobile logistic services with high quality of service and adaptation to the dynamic nature of logistic operations.
文摘A multifrontal code is introduced for the efficient solution of the linear system of equations arising from the analysis of structures. The factorization phase is reduced into a series of interleaved element assembly and dense matrix operations for which the BLAS3 kernels are used. A similar approach is generalized for the forward and back substitution phases for the efficient solution of structures having multiple load conditions. The program performs all assembly and solution steps in parallel. Examples are presented which demonstrate the code’s performance on single and dual core processor computers.
基金the Tianhe Supercomputer Project(No.2018YFB0204301)the National Natural Science Foundation of China(No.61902405)+1 种基金the PDL Research Fund(No.6142110190404)the National High-Level Personnel for Defense Technology Program(No.2017-JCJQ-ZQ-013)。
文摘Traditional high performance computing(HPC)systems provide a standard preset environment to support scientific computation.However,HPC development needs to provide support for more and more diverse applications,such as artificial intelligence and big data.The standard preset environment can no longer meet these diverse requirements.If users still run these emerging applications on HPC systems,they need to manually maintain the specific dependencies(libraries,environment variables,and so on)of their applications.This increases the development and deployment burden for users.Moreover,the multi-user mode brings about privacy problems among users.Containers like Docker and Singularity can encapsulate the job’s execution environment,but in a highly customized HPC system,cross-environment application deployment of Docker and Singularity is limited.The introduction of container images also imposes a maintenance burden on system administrators.Facing the above-mentioned problems,in this paper we propose a self-deployed execution environment(SDEE)for HPC.SDEE combines the advantages of traditional virtualization and modern containers.SDEE provides an isolated and customizable environment(similar to a virtual machine)to the user.The user is the root user in this environment.The user develops and debugs the application and deploys its special dependencies in this environment.Then the user can load the job to compute nodes directly through the traditional HPC job management system.The job and its dependencies are analyzed,packaged,deployed,and executed automatically.This process enables transparent and rapid job deployment,which not only reduces the burden on users,but also protects user privacy.Experiments show that the overhead introduced by SDEE is negligible and lower than those of both Docker and Singularity.
基金performed using HPC resources from CALMIP(Grant 2011-[P1053])supported by the French Agence Nationale de la Recherche under Project REMOVAL ANR-12-BS09-0019-1
文摘The aim of this paper is to test a developed SOR R&B method using the Chebyshev accelerator algorithm to solve the Laplace equation in a cubic 3D configuration. Comparisons are made in terms of precision and computing time with other elliptic equation solvers proposed in the open source LIS library. The first results, obtained by using a single core on a HPC, show that the developed SOR R&B method is efficient when the spectral radius needed for the Chebyshev acceleration is carefully pre-estimated. Preliminary results obtained with a parallelized code using the MPI library are also discussed when the calculation is distributed over one hundred cores.
基金This research is supported by the National Natural Science Foundation of China(Grant No.71671190).
文摘A new spectral three-term conjugate gradient algorithm in virtue of the Quasi-Newton equation is developed for solving large-scale unconstrained optimization problems.It is proved that the search directions in this algorithm always satisfy a sufficiently descent condition independent of any line search.Global convergence is established for general objective functions if the strong Wolfe line search is used.Numerical experiments are employed to show its high numerical performance in solving large-scale optimization problems.Particularly,the developed algorithm is implemented to solve the 100 benchmark test problems from CUTE with different sizes from 1000 to 10,000,in comparison with some similar ones in the literature.The numerical results demonstrate that our algorithm outperforms the state-of-the-art ones in terms of less CPU time,less number of iteration or less number of function evaluation.
基金the Hunan Provincial Science and Technology Plan Project.The specific grant number is 2018XK2102.
文摘Convolutional Neural Networks(CNNs)are widely used in many fields.Due to their high throughput and high level of computing characteristics,however,an increasing number of researchers are focusing on how to improve the computational efficiency,hardware utilization,or flexibility of CNN hardware accelerators.Accordingly,this paper proposes a dynamically reconfigurable accelerator architecture that implements a Sparse-Winograd F(2×2.3×3)-based high-parallelism hardware architecture.This approach not only eliminates the pre-calculation complexity associated with the Winograd algorithm,thereby reducing the difficulty of hardware implementation,but also greatly improves the flexibility of the hardware;as a result,the accelerator can realize the calculation of Conventional Convolution,Grouped Convolution(GCONV)or Depthwise Separable Convolution(DSC)using the same hardware architecture.Our experimental results show that the accelerator achieves a 3x–4.14x speedup compared with the designs that do not use the acceleration algorithm on VGG-16 and MobileNet V1.Moreover,compared with previous designs using the traditional Winograd algorithm,the accelerator design achieves 1.4x–1.8x speedup.At the same time,the efficiency of the multiplier improves by up to 142%.
基金supported by National Key Research and Development Program 2016YFC0105406National Natural Science Foundation of China(11575095,61571262)。
文摘Objective:As a high computation cost discipline,nuclear science and engineering still relies heavily on traditional high performance computing(HPC)clusters.However,the usage of traditional HPC for nuclear science and engineering has been limited due to the poor flexibility,the software compatibility and the poor user interfaces.Virtualized/virtual HPC(vHPC)can mimic an HPC by using a cloud computing platform.In this work,we designed and developed a vHPC system for employment in nuclear engineering.Methods:The system is tested using the computation of the numberπby Monte Carlo and an X-ray digital imaging system simulation.The performance of the vHPC system is compared with that of the traditional HPCs.Results:As the number of the simulated particles increases,the virtual cluster computing time grows propor-tionally.The time used for the simulation of the X-ray imaging was about 21.1 h over a 12 kernels virtual server.Experimental results show that the performance of virtual cluster computing and the actual physical machine is almost the same.Conclusions:From these tests,it is concluded that vHPC is a good alternative for employing in nuclear engineering.The proposed vHPC in this paper will make HPC flexible and easy to deploy.
基金the National Natural Science Foundation of China(No.41971356,41701446)National Key Research and Development Program of China(No.2017YFB0503600,2018YFB0505500,2017YFC0602204).
文摘Parallel vector buffer analysis approaches can be classified into 2 types:algorithm-oriented parallel strategy and the data-oriented parallel strategy.These methods do not take its applicability on the existing geographic information systems(GIS)platforms into consideration.In order to address the problem,a spatial decomposition approach for accelerating buffer analysis of vector data is proposed.The relationship between the number of vertices of each feature and the buffer analysis computing time is analyzed to generate computational intensity transformation functions(CITFs).Then,computational intensity grids(CIGs)of polyline and polygon are constructed based on the relative CITFs.Using the corresponding CIGs,a spatial decomposition method for parallel buffer analysis is developed.Based on the computational intensity of the features and the sub-domains generated in the decomposition,the features are averagely assigned within the sub-domains into parallel buffer analysis tasks for load balance.Compared with typical regular domain decomposition methods,the new approach accomplishes greater balanced decomposition of computational intensity for parallel buffer analysis and achieves near-linear speedups.
基金G.F.acknowledges CINECA computational grant ISCRA-B IsB17–SPONGES,no.HP10B9ZOKQ and,partially,the support of PRIN projects CUP E82F16003010006(principal investigator,G.F.for the Tor Vergata Research Unit)and CUP E84I19001020006(principal investigator,G.Bella)support from the European Research Council under the Horizon 2020 Programme advanced grant agreement no.739964(‘COPMAT’)M.P.acknowledges the support of the National Science Foundation under grant no.CMMI 1901697.
文摘We detail some of the understudied aspects of the flow inside and around the Hexactinellid Sponge Euplectella aspergillum.By leveraging the flexibility of the Lattice Boltzmann Method,High Performance Computing simulations are performed to dissect the complex conditions corresponding to the actual environment at the bottom of the ocean,at depths between 100 and 1,000 m.These large-scale simulations unveil potential clues on the evolutionary adaptations of these deep-sea sponges in response to the surrounding fluid flow,and they open the path to future investigations at the interface between physics,engineering and biology.
基金supported by the U.S.Department of Energy,Advanced Research Projects Agency-Energy(ARPAE)and Office of Electricity Delivery and Energy Reliability through its Advanced Grid Modeling Program.Pacific Northwest National Laboratory(PNNL)is operated by Battelle for the DOE under Contract DE-AC05-76RL01830.
文摘Every year, transmission congestion costs billions ofdollars for electricity customers. This clearly identifies the criticalneed for more transmission capacity and also poses big challengesfor power grid reliability in stressed conditions due to heavyloading and in uncertain situations due to variable renewableresources and responsive smart loads. However, it becomesincreasingly difficult to build new transmission lines, whichtypically involve both economic and environmental constraints.In this paper, advanced computing techniques are developedto enable a non-wire solution that realizes unused transfercapabilities of existing transmission facilities. An integratedsoftware prototype powered by high-performance computing(HPC) is developed to calculate ratings of key transmission pathsin real time for relieving transmission congestion and facilitatingrenewable integration, while complying with the North AmericanElectric Reliability Corporation (NERC) standards on assessingtotal transfer capabilities. The innovative algorithms include: (1)massive contingency analysis enabled by dynamic load balancing,(2) parallel transient simulation to speed up single dynamicsimulation, (3) a non-iterative method for calculating voltagesecurity boundary and (4) an integrated package consideringall NERC required limits. This tool has been tested on realisticpower system models in the Western Interconnection of NorthAmerica and demonstrates satisfactory computational speedusing parallel computers. Various benefits of real-time path ratingare investigated at Bonneville Power Administration using realtime EMS snapshots, demonstrating a significant increase in pathlimits. These technologies would change the traditional goals ofpath rating studies, fundamentally transforming how the grid isoperated, and maximizing the utilization of national transmissionassets, as well as facilitating integration of renewable energy andsmart loads.
基金W.B.has been financed by the European Research Council(ERC)under the European Union’s Seventh Framework Programme(FP7/2007-2013)with the research project STiMulUs,ERC Grant agreement no.278267R.L.has been partially funded by the ANR under the JCJC project“ALE INC(ubator)3D”JS01-012-01the“International Centre for Mathematics and Computer Science in Toulouse”(CIMI)partially supported by ANR-11-LABX-0040-CIMI within the program ANR-11-IDEX-0002-02.The authors would like to acknowledge PRACE for awarding access to the SuperMUC supercomputer based in Munich,Germany at the Leibniz Rechenzentrum(LRZ).Parts of thematerial contained in this work have been elaborated,gathered and tested while W.B.visited the Mathematical Institute of Toulouse for three months and R.L.visited the Dipartimento di Ingegneria Civile Ambientale e Meccanica in Trento for three months.
文摘In this paper we present a 2D/3D high order accurate finite volume scheme in the context of direct Arbitrary-Lagrangian-Eulerian algorithms for general hyperbolic systems of partial differential equations with non-conservative products and stiff source terms.This scheme is constructed with a single stencil polynomial reconstruction operator,a one-step space-time ADER integration which is suitably designed for dealing even with stiff sources,a nodal solver with relaxation to determine the mesh motion,a path-conservative integration technique for the treatment of non-conservative products and an a posteriori stabilization procedure derived from the so-called Multidimensional Optimal Order Detection(MOOD)paradigm.In this work we consider the seven equation Baer-Nunziato model of compressible multi-phase flows as a representative model involving non-conservative products as well as relaxation source terms which are allowed to become stiff.The new scheme is validated against a set of test cases on 2D/3D unstructured moving meshes on parallel machines and the high order of accuracy achieved by the method is demonstrated by performing a numerical convergence study.Classical Riemann problems and explosion problems with exact solutions are simulated in 2D and 3D.The overall numerical code is also profiled to provide an estimate of the computational cost required by each component of the whole algorithm.