期刊文献+
共找到46篇文章
< 1 2 3 >
每页显示 20 50 100
Evaluation of the computational performance of the finite-volume atmospheric model of the IAP/LASG(FAMIL) on a high-performance computer 被引量:9
1
作者 LI Jin-Xiao BAO Qing +1 位作者 LIU Yi-Min WU Guo-Xiong 《Atmospheric and Oceanic Science Letters》 CSCD 2017年第4期329-336,共8页
High computational performance is extremely important for climate system models, especially in ultra-high-resolution model development. In this study, the computational performance of the Finite-volume Atmospheric Mod... High computational performance is extremely important for climate system models, especially in ultra-high-resolution model development. In this study, the computational performance of the Finite-volume Atmospheric Model of the IAP/LASG (FAMIL) was comprehensively evaluated on Tianhe-2, which was the world's top-ranked supercomputer from June 2013 to May 2016. The standardized Atmospheric Model Inter-comparison Project (AMIP) type of experiment was carried out that focused on the computational performance of each node as well as the simulation year per day (SYPD), the running cost speedup, and the scalability of the FAMIL. The results indicated that (1) based on five indexes (CPU usage, percentage of CPU kernel mode that occupies CPU time and of message passing waiting time (CPU SW), code vectorization (VEC), average of Gflops (Gflops_ AVE), and peak of Gflops (Gflops_PK)), FAMIL shows excellent computational performance on every Tianhe-2 computing node; (2) considering SYPD and the cost speedup of FAMIL systematically, the optimal Message Passing Interface (MPI) numbers of processors (MNPs) choice appears when FAMIL use 384 and 1536 MNPs for C96 (100 km) and C384 (25 km), respectively; and (3) FAMIL shows positive scalability with increased threads to drive the model. Considering the fast network speed and acceleration card in the MIC architecture on Tianhe-2, there is still significant room to improve the computational performance of FAMIL. 展开更多
关键词 FAMIL scalability computational performance Tianhe-2
下载PDF
A review of high performance computing applications in high-speed rail systems
2
作者 Shenyuan Ren Yidong Li 《High-Speed Railway》 2023年第2期92-96,共5页
Further improving the railway innovation capacity and technological strength is the important goal of the 14th Five-Year Plan for railway scientific and technological innovation.It includes promoting the deep integrat... Further improving the railway innovation capacity and technological strength is the important goal of the 14th Five-Year Plan for railway scientific and technological innovation.It includes promoting the deep integration of cutting-edge technologies with the railway systems,strengthening the research and application of intelligent railway technologies,applying green computing technologies and advancing the collaborative sharing of transportation big data.The high-speed rail system tasks need to process huge amounts of data and heavy workload with the requirement of ultra-fast response.Therefore,it is of great necessity to promote computation efficiency by applying High Performance Computing(HPC)to high-speed rail systems.The HPC technique is a great solution for improving the performance,efficiency,and safety of high-speed rail systems.In this review,we introduce and analyze the application research of high performance computing technology in the field of highspeed railways.These HPC applications are cataloged into four broad categories,namely:fault diagnosis,network and communication,management system,and simulations.Moreover,challenges and issues to be addressed are discussed and further directions are suggested. 展开更多
关键词 High performance computing High-speed rail
下载PDF
HIGH PERFORMANCE SPARSE SOLVER FOR UNSYMMETRICAL LINEAR EQUATIONS WITH OUT-OF-CORE STRATEGIES AND ITS APPLICATION ON MESHLESS METHODS 被引量:1
3
作者 苑维然 陈璞 刘凯欣 《Applied Mathematics and Mechanics(English Edition)》 SCIE EI 2006年第10期1339-1348,共10页
A new direct method for solving unsymmetrical sparse linear systems(USLS) arising from meshless methods was introduced. Computation of certain meshless methods such as meshless local Petrov-Galerkin (MLPG) method ... A new direct method for solving unsymmetrical sparse linear systems(USLS) arising from meshless methods was introduced. Computation of certain meshless methods such as meshless local Petrov-Galerkin (MLPG) method need to solve large USLS. The proposed solution method for unsymmetrical case performs factorization processes symmetrically on the upper and lower triangular portion of matrix, which differs from previous work based on general unsymmetrical process, and attains higher performance. It is shown that the solution algorithm for USLS can be simply derived from the existing approaches for the symmetrical case. The new matrix factorization algorithm in our method can be implemented easily by modifying a standard JKI symmetrical matrix factorization code. Multi-blocked out-of-core strategies were also developed to expand the solution scale. The approach convincingly increases the speed of the solution process, which is demonstrated with the numerical tests. 展开更多
关键词 sparse matrices linear equations meshless methods high performance computation
下载PDF
Parallel Image Processing: Taking Grayscale Conversion Using OpenMP as an Example
4
作者 Bayan AlHumaidan Shahad Alghofaily +2 位作者 Maitha Al Qhahtani Sara Oudah Naya Nagy 《Journal of Computer and Communications》 2024年第2期1-10,共10页
In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularl... In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularly noteworthy in the field of image processing, which witnessed significant advancements. This parallel computing project explored the field of parallel image processing, with a focus on the grayscale conversion of colorful images. Our approach involved integrating OpenMP into our framework for parallelization to execute a critical image processing task: grayscale conversion. By using OpenMP, we strategically enhanced the overall performance of the conversion process by distributing the workload across multiple threads. The primary objectives of our project revolved around optimizing computation time and improving overall efficiency, particularly in the task of grayscale conversion of colorful images. Utilizing OpenMP for concurrent processing across multiple cores significantly reduced execution times through the effective distribution of tasks among these cores. The speedup values for various image sizes highlighted the efficacy of parallel processing, especially for large images. However, a detailed examination revealed a potential decline in parallelization efficiency with an increasing number of cores. This underscored the importance of a carefully optimized parallelization strategy, considering factors like load balancing and minimizing communication overhead. Despite challenges, the overall scalability and efficiency achieved with parallel image processing underscored OpenMP’s effectiveness in accelerating image manipulation tasks. 展开更多
关键词 Parallel Computing Image Processing OPENMP Parallel Programming High performance Computing GPU (Graphic Processing Unit)
下载PDF
A NEW HIGH PERFORMANCE SPARSE STATIC SOLVER IN FINITE ELEMENT ANALYSIS WITH LOOP-UNROLLING 被引量:1
5
作者 Chen Pu Sun Shuli 《Acta Mechanica Solida Sinica》 SCIE EI 2005年第3期248-255,共8页
In the previous papers, a high performance sparse static solver with two-level unrolling based on a cell-sparse storage scheme was reported. Although the solver reaches quite a high efficiency for a big percentage of ... In the previous papers, a high performance sparse static solver with two-level unrolling based on a cell-sparse storage scheme was reported. Although the solver reaches quite a high efficiency for a big percentage of finite element analysis benchmark tests, the MFLOPS (million floating operations per second) of LDL^T factorization of benchmark tests vary on a Dell Pentium IV 850 MHz machine from 100 to 456 depending on the average size of the super-equations, i.e., on the average depth of unrolling. In this paper, a new sparse static solver with two-level unrolling that employs the concept of master-equations and searches for an appropriate depths of unrolling is proposed. The new solver provides higher MFLOPS for LDL^T factorization of benchmark tests, and therefore speeds up the solution process. 展开更多
关键词 high performance computing sparse matrix finite element analysis
下载PDF
The Changing Face of High Performance Computing in the United States 被引量:2
6
作者 Ann Haves(Advanced Computing Laboratory Los Alamos National Laboratory Los Alamos. NM 87545, USA) 《Wuhan University Journal of Natural Sciences》 CAS 1996年第Z1期309-311,共3页
TheChangingFaceofHighPerformanceComputingintheUnitedStatesAnnHaves(AdvancedComputingLaboratoryLosAlamosNatio... TheChangingFaceofHighPerformanceComputingintheUnitedStatesAnnHaves(AdvancedComputingLaboratoryLosAlamosNationalLaboratoryLosA... 展开更多
关键词 The Changing Face of High performance Computing in the United States
下载PDF
Improving Performance of Cloud Computing and Big Data Technologies and Applications 被引量:1
7
作者 Zhenjiang Dong 《ZTE Communications》 2014年第4期1-2,共2页
Cloud computing technology is changing the development and usage patterns of IT infrastructure and applications. Virtualized and distributed systems as well as unified management and scheduling has greatly im proved c... Cloud computing technology is changing the development and usage patterns of IT infrastructure and applications. Virtualized and distributed systems as well as unified management and scheduling has greatly im proved computing and storage. Management has become easier, andOAM costs have been significantly reduced. Cloud desktop technology is develop ing rapidly. With this technology, users can flexibly and dynamically use virtual ma chine resources, companies' efficiency of using and allocating resources is greatly improved, and information security is ensured. In most existing virtual cloud desk top solutions, computing and storage are bound together, and data is stored as im age files. This limits the flexibility and expandability of systems and is insufficient for meetinz customers' requirements in different scenarios. 展开更多
关键词 Improving performance of Cloud Computing and Big Data Technologies and Applications HBASE
下载PDF
Representing Increasing Virtual Machine Security Strategy in Cloud Computing Computations 被引量:1
8
作者 Mohammad Shirzadi 《Electrical Science & Engineering》 2021年第2期7-16,共10页
This paper proposes algorithm for Increasing Virtual Machine Security Strategy in Cloud Computing computations.Imbalance between load and energy has been one of the disadvantages of old methods in providing server and... This paper proposes algorithm for Increasing Virtual Machine Security Strategy in Cloud Computing computations.Imbalance between load and energy has been one of the disadvantages of old methods in providing server and hosting,so that if two virtual severs be active on a host and energy load be more on a host,it would allocated the energy of other hosts(virtual host)to itself to stay steady and this option usually leads to hardware overflow errors and users dissatisfaction.This problem has been removed in methods based on cloud processing but not perfectly,therefore,providing an algorithm not only will implement a suitable security background but also it will suitably divide energy consumption and load balancing among virtual severs.The proposed algorithm is compared with several previously proposed Security Strategy including SC-PSSF,PSSF and DEEAC.Comparisons show that the proposed method offers high performance computing,efficiency and consumes lower energy in the network. 展开更多
关键词 Cloud computing High performance computing AUTOMATION Security SERVER
下载PDF
Multi Granularity Page Size Support for Linux and the Performance Evaluation
9
作者 Naohiko Shimizu School of Engineering, Tokai University,1117 Kitakaname Hiratsuka shi, Kanagawa 259 1292, Japan 《Wuhan University Journal of Natural Sciences》 CAS 2001年第Z1期347-350,共4页
Today the PC class machines are quite popular for HPC area, especially on the problemsthat require the good cost/performance ratios. One of the drawback of these machines is the poormemory throughput performance. And ... Today the PC class machines are quite popular for HPC area, especially on the problemsthat require the good cost/performance ratios. One of the drawback of these machines is the poormemory throughput performance. And one of the reasons of the poor performance is depend on the lack of the mapping capability of the TLB which is a buffer to accelerate the virtual memory access. In this report, I present that the mapping capability and the performance can be improved with the multi granularity TLB feature that some processors have. And I also present that the new TLB handling routine can be incorporated into the demand paging system of Linux. 展开更多
关键词 translation look aride buffer LINUX high performance computing performance evaluation
下载PDF
Parallel Optical Interconnect Technology: Combination of Higher Performance and Lower Energy Consumption
10
作者 Qiao Yaojun Gu Rentao Ji Yuefeng 《China Communications》 SCIE CSCD 2010年第3期99-106,共8页
This paper analyzes the physical potential, computing performance benefi t and power consumption of optical interconnects. Compared with electrical interconnections, optical ones show undoubted advantages based on phy... This paper analyzes the physical potential, computing performance benefi t and power consumption of optical interconnects. Compared with electrical interconnections, optical ones show undoubted advantages based on physical factor analysis. At the same time, since the recent developments drive us to think about whether these optical interconnect technologies with higher bandwidth but higher cost are worthy to be deployed, the computing performance comparison is performed. To meet the increasing demand of large-scale parallel or multi-processor computing tasks, an analytic method to evaluate parallel computing performance ofinterconnect systems is proposed in this paper. Both bandwidth-limit model and full-bandwidth model are under our investigation. Speedup and effi ciency are selected to represent the parallel performance of an interconnect system. Deploying the proposed models, we depict the performance gap between the optical and electrically interconnected systems. Another investigation on power consumption of commercial products showed that if the parallel interconnections are deployed, the unit power consumption will be reduced. Therefore, from the analysis of computing influence and power dissipation, we found that parallel optical interconnect is valuable combination of high performance and low energy consumption. Considering the possible data center under construction, huge power could be saved if parallel optical interconnects technologies are used. 展开更多
关键词 optical interconnects high performance computing power dissipation
下载PDF
Numerical Study of Aeroacoustic Sound on Performance of Bladeless Fan 被引量:1
11
作者 Mohammad Jafari Atta Sojoudi Parinaz Hafezisefat 《Chinese Journal of Mechanical Engineering》 SCIE EI CAS CSCD 2017年第2期483-494,共12页
Aeroacoustic performance of fans is essential due to their widespread application. Therefore, the original aim of this paper is to evaluate the generated noise owing to different geometric parameters. In current study... Aeroacoustic performance of fans is essential due to their widespread application. Therefore, the original aim of this paper is to evaluate the generated noise owing to different geometric parameters. In current study, effect of five geometric parameters was investigated on well performance of a Bladeless fan. Airflow through this fan was analyzed simulating a Bladeless fan within a 2 m×2 m×4 m room. Analysis of the flow field inside the fan and evaluating its performance were obtained by solving conservations of mass and momentum equations for aerodynamic investigations and FW-H noise equations for aeroacoustic analysis. In order to design Bladeless fan Eppler 473 airfoil profile was used as the cross section of this fan. Five distinct parameters, namely height of cross section of the fan, outlet angle of the flow relative to the fan axis, thickness of airflow outlet slit, hydraulic diameter and aspect ratio for circular and quadratic cross sections were considered. Validating acoustic code results, we compared numerical solution of FW-H noise equations for NACA0012 with experimental results. FW-H model was selected to predict the noise generated by the Bladeless fan as the numerical results indicated a good agreement with experimental ones for NACA0012. To validate 3-D numerical results, the experimental results of a round jet showed good agreement with those simulation data. In order to indicate the effect of each mentioned parameter on the fan performance, SPL and OASPL diagrams were illustrated. 展开更多
关键词 Bladeless fan · computational fluid dynamic (CFD) · Aeroacoustic performance · Ffowcs Williams and Hawkings (FW-H) formulation
下载PDF
Intrusion Detection Using Federated Learning for Computing
12
作者 R.S.Aashmi T.Jaya 《Computer Systems Science & Engineering》 SCIE EI 2023年第5期1295-1308,共14页
The integration of clusters,grids,clouds,edges and other computing platforms result in contemporary technology of jungle computing.This novel technique has the aptitude to tackle high performance computation systems a... The integration of clusters,grids,clouds,edges and other computing platforms result in contemporary technology of jungle computing.This novel technique has the aptitude to tackle high performance computation systems and it manages the usage of all computing platforms at a time.Federated learning is a collaborative machine learning approach without centralized training data.The proposed system effectively detects the intrusion attack without human intervention and subsequently detects anomalous deviations in device communication behavior,potentially caused by malicious adversaries and it can emerge with new and unknown attacks.The main objective is to learn overall behavior of an intruder while performing attacks to the assumed target service.Moreover,the updated system model is send to the centralized server in jungle computing,to detect their pattern.Federated learning greatly helps the machine to study the type of attack from each device and this technique paves a way to complete dominion over all malicious behaviors.In our proposed work,we have implemented an intrusion detection system that has high accuracy,low False Positive Rate(FPR)scalable,and versatile for the jungle computing environment.The execution time taken to complete a round is less than two seconds,with an accuracy rate of 96%. 展开更多
关键词 Jungle computing high performance computation federated learning false positive rate intrusion detection system(IDS)
下载PDF
The future is frozen:cryogenic CMOS for high-performance computing
13
作者 R.Saligram A.Raychowdhury Suman Datta 《Chip》 EI 2024年第1期43-54,共12页
Low temperature complementary metal oxide semiconductor(CMOS)or cryogenic CMOS is a promising avenue for the continuation of Moore’s law while serving the needs of high performance computing.With temperature as a con... Low temperature complementary metal oxide semiconductor(CMOS)or cryogenic CMOS is a promising avenue for the continuation of Moore’s law while serving the needs of high performance computing.With temperature as a control“knob”to steepen the subthreshold slope behavior of CMOS devices,the supply voltage of operation can be reduced with no impact on operating speed.With the optimal threshold voltage engineering,the device ON current can be further enhanced,translating to higher performance.In this article,the experimentally calibrated data was adopted to tune the threshold voltage and investigated the power performance area of cryogenic CMOS at device,circuit and system level.We also presented results from measurement and analysis of functional memory chips fabricated in 28 nm bulk CMOS and 22 nm fully depleted silicon on insulator(FDSOI)operating at cryogenic temperature.Finally,the challenges and opportunities in the further development and deployment of such systems were discussed. 展开更多
关键词 Cryogenic CMOS Design technology co-optimization High performance computing Parameter variation Threshold voltage engineering Cryogenic Memories Interconnects
原文传递
High Throughput Scheduling Algorithms for Input Queued Packet Switches 被引量:2
14
作者 R.Chithra Devi D.Jemi Florinabel Narayanan Prasanth 《Computers, Materials & Continua》 SCIE EI 2022年第1期1527-1540,共14页
The high-performance computing paradigm needs high-speed switching fabrics to meet the heavy traffic generated by their applications.These switching fabrics are efficiently driven by the deployed scheduling algorithms... The high-performance computing paradigm needs high-speed switching fabrics to meet the heavy traffic generated by their applications.These switching fabrics are efficiently driven by the deployed scheduling algorithms.In this paper,we proposed two scheduling algorithms for input queued switches whose operations are based on ranking procedures.At first,we proposed a Simple 2-Bit(S2B)scheme which uses binary ranking procedure and queue size for scheduling the packets.Here,the Virtual Output Queue(VOQ)set with maximum number of empty queues receives higher rank than other VOQ’s.Through simulation,we showed S2B has better throughput performance than Highest Ranking First(HRF)arbitration under uniform,and non-uniform traffic patterns.To further improve the throughput-delay performance,an Enhanced 2-Bit(E2B)approach is proposed.This approach adopts an integer representation for rank,which is the number of empty queues in a VOQ set.The simulation result shows E2B outperforms S2B and HRF scheduling algorithms with maximum throughput-delay performance.Furthermore,the algorithms are simulated under hotspot traffic and E2B proves to be more efficient. 展开更多
关键词 Crossbar switch input queued switch virtual output queue scheduling algorithm high performance computing
下载PDF
Smart Cities in Europe and the ALMA Logistics Project 被引量:2
15
作者 Didier El Baz Julien Bourgeois 《ZTE Communications》 2015年第4期10-15,共6页
In this paper, a brief survey of smart citiy projects in Europe is presented. This survey shows the extent of transport and logistics in smart cities. We concentrate on a smart city project we have been working on tha... In this paper, a brief survey of smart citiy projects in Europe is presented. This survey shows the extent of transport and logistics in smart cities. We concentrate on a smart city project we have been working on that is related to A Logistic Mobile Application (ALMA). The application is based on Internet of Things and combines a communication infrastructure and a High Performance Computing infrastructure in order to deliver mobile logistic services with high quality of service and adaptation to the dynamic nature of logistic operations. 展开更多
关键词 smart cities Internet of Things LOGISTICS combinatorial optimization high performance computing
下载PDF
A spatial decomposition approach for accelerating buffer analysis of vector data 被引量:1
16
作者 Li Xiaohua Guo Mingqiang Qi Xinhong 《High Technology Letters》 EI CAS 2020年第4期455-459,共5页
Parallel vector buffer analysis approaches can be classified into 2 types:algorithm-oriented parallel strategy and the data-oriented parallel strategy.These methods do not take its applicability on the existing geogra... Parallel vector buffer analysis approaches can be classified into 2 types:algorithm-oriented parallel strategy and the data-oriented parallel strategy.These methods do not take its applicability on the existing geographic information systems(GIS)platforms into consideration.In order to address the problem,a spatial decomposition approach for accelerating buffer analysis of vector data is proposed.The relationship between the number of vertices of each feature and the buffer analysis computing time is analyzed to generate computational intensity transformation functions(CITFs).Then,computational intensity grids(CIGs)of polyline and polygon are constructed based on the relative CITFs.Using the corresponding CIGs,a spatial decomposition method for parallel buffer analysis is developed.Based on the computational intensity of the features and the sub-domains generated in the decomposition,the features are averagely assigned within the sub-domains into parallel buffer analysis tasks for load balance.Compared with typical regular domain decomposition methods,the new approach accomplishes greater balanced decomposition of computational intensity for parallel buffer analysis and achieves near-linear speedups. 展开更多
关键词 high performance spatial computing buffer analysis parallel computing load balancing vector data
下载PDF
Advances in Ordinary Kriging Using the Stampede and Bridges Supercomputers 被引量:1
17
作者 Erin M. Hodgess Kendra Mhoon 《Journal of Geological Resource and Engineering》 2018年第1期14-18,共5页
A new approach for the implementation of variogram models and ordinary kriging using the R statistical language, in conjunction with Fortran, the MPI (Message Passing Interface), and the "pbdDMAT" package within R... A new approach for the implementation of variogram models and ordinary kriging using the R statistical language, in conjunction with Fortran, the MPI (Message Passing Interface), and the "pbdDMAT" package within R on the Bridges and Stampede Supercomputers will be described. This new technique has led to great improvements in timing as compared to those in R alone, or R with C and MPI. These improvements include processing and forecasting vectors of size 25,000 in an average time of 6 minutes on the Stampede Supercomputer and 2.5 minutes on the Bridges Supercomputer as compared to previous processing times of 3.5 hours. 展开更多
关键词 KRIGING GEOSTATISTICS high performance computing.
下载PDF
Sparse Approximations of the Schur Complement for Parallel Algebraic Hybrid Solvers in 3D
18
作者 L.Giraud A.Haidar Y.Saad 《Numerical Mathematics(Theory,Methods and Applications)》 SCIE 2010年第3期276-294,共19页
In this paper we study the computational performance of variants of an algebraic additive Schwarz preconditioner for the Schur complement for the solution of large sparse linear systems.In earlier works,the local Schu... In this paper we study the computational performance of variants of an algebraic additive Schwarz preconditioner for the Schur complement for the solution of large sparse linear systems.In earlier works,the local Schur complements were computed exactly using a sparse direct solver.The robustness of the preconditioner comes at the price of this memory and time intensive computation that is the main bottleneck of the approach for tackling huge problems.In this work we investigate the use of sparse approximation of the dense local Schur complements.These approximations are computed using a partial incomplete LU factorization.Such a numerical calculation is the core of the multi-level incomplete factorization such as the one implemented in pARMS. The numerical and computing performance of the new numerical scheme is illustrated on a set of large 3D convection-diffusion problems;preliminary experiments on linear systems arising from structural mechanics are also reported. 展开更多
关键词 Hybrid direct/iterative solver domain decomposition incomplete/partial factorization Schur approximation scalable preconditioner CONVECTION-DIFFUSION large 3D problems parallelscientific computing High performance Computing.
下载PDF
Benchmarks of 3D Laplace Equation Solvers in a Cubic Configuration for Streamer Simulation
19
作者 Joseph-Marie PLEWA Olivier DUCASSE +4 位作者 Philippe DESSANTE Carolyn JACOBS Olivier EICHWALD Nicolas RENON Mohammed YOUSFI 《Plasma Science and Technology》 SCIE EI CAS CSCD 2016年第5期538-543,共6页
The aim of this paper is to test a developed SOR R&B method using the Chebyshev accelerator algorithm to solve the Laplace equation in a cubic 3D configuration. Comparisons are made in terms of precision and computin... The aim of this paper is to test a developed SOR R&B method using the Chebyshev accelerator algorithm to solve the Laplace equation in a cubic 3D configuration. Comparisons are made in terms of precision and computing time with other elliptic equation solvers proposed in the open source LIS library. The first results, obtained by using a single core on a HPC, show that the developed SOR R&B method is efficient when the spectral radius needed for the Chebyshev acceleration is carefully pre-estimated. Preliminary results obtained with a parallelized code using the MPI library are also discussed when the calculation is distributed over one hundred cores. 展开更多
关键词 numerical methods for elliptic equations high performance computing 3Dstreamer simulation SOR IDR BiCGSTAB
下载PDF
A Dynamically Reconfigurable Accelerator Design Using a Sparse-Winograd Decomposition Algorithm for CNNs
20
作者 Yunping Zhao Jianzhuang Lu Xiaowen Chen 《Computers, Materials & Continua》 SCIE EI 2021年第1期517-535,共19页
Convolutional Neural Networks(CNNs)are widely used in many fields.Due to their high throughput and high level of computing characteristics,however,an increasing number of researchers are focusing on how to improve the... Convolutional Neural Networks(CNNs)are widely used in many fields.Due to their high throughput and high level of computing characteristics,however,an increasing number of researchers are focusing on how to improve the computational efficiency,hardware utilization,or flexibility of CNN hardware accelerators.Accordingly,this paper proposes a dynamically reconfigurable accelerator architecture that implements a Sparse-Winograd F(2×2.3×3)-based high-parallelism hardware architecture.This approach not only eliminates the pre-calculation complexity associated with the Winograd algorithm,thereby reducing the difficulty of hardware implementation,but also greatly improves the flexibility of the hardware;as a result,the accelerator can realize the calculation of Conventional Convolution,Grouped Convolution(GCONV)or Depthwise Separable Convolution(DSC)using the same hardware architecture.Our experimental results show that the accelerator achieves a 3x–4.14x speedup compared with the designs that do not use the acceleration algorithm on VGG-16 and MobileNet V1.Moreover,compared with previous designs using the traditional Winograd algorithm,the accelerator design achieves 1.4x–1.8x speedup.At the same time,the efficiency of the multiplier improves by up to 142%. 展开更多
关键词 High performance computing accelerator architecture HARDWARE
下载PDF
上一页 1 2 3 下一页 到第
使用帮助 返回顶部