In order to improve the concurrent access performance of the web-based spatial computing system in cluster,a parallel scheduling strategy based on the multi-core environment is proposed,which includes two levels of pa...In order to improve the concurrent access performance of the web-based spatial computing system in cluster,a parallel scheduling strategy based on the multi-core environment is proposed,which includes two levels of parallel processing mechanisms.One is that it can evenly allocate tasks to each server node in the cluster and the other is that it can implement the load balancing inside a server node.Based on the strategy,a new web-based spatial computing model is designed in this paper,in which,a task response ratio calculation method,a request queue buffer mechanism and a thread scheduling strategy are focused on.Experimental results show that the new model can fully use the multi-core computing advantage of each server node in the concurrent access environment and improve the average hits per second,average I/O Hits,CPU utilization and throughput.Using speed-up ratio to analyze the traditional model and the new one,the result shows that the new model has the best performance.The performance of the multi-core server nodes in the cluster is optimized;the resource utilization and the parallel processing capabilities are enhanced.The more CPU cores you have,the higher parallel processing capabilities will be obtained.展开更多
The present research attempted a Large-Eddy Simulation (LES) of airflow over a steep, three-dimensional isolated hill by using the latest multi-cores multi-CPUs systems. As a result, it was found that 1) turbulence si...The present research attempted a Large-Eddy Simulation (LES) of airflow over a steep, three-dimensional isolated hill by using the latest multi-cores multi-CPUs systems. As a result, it was found that 1) turbulence simulations using approximately 50 million grid points are feasible and 2) the use of this system resulted in the achievement of a high computation speed, which exceeded the speed of parallel computation attained by a single CPU on one of the latest supercomputers. Furthermore, LES was conducted by using the multi-GPUs systems. The results of these simulations revealed the following findings: 1) the multi-GPUs environment which used the NVDIA? Tesla M2090 or the M2075 could simulate turbulence in a model with as many as approximately 50 million grid points. 2) The computation speed achieved by the multi-GPUs environments exceeded that by parallel computation which used four to six CPUs of one of the latest supercomputers.展开更多
This paper describes parallel simulation techniques for the discrete element method (DEM) on multi-core processors. Recently, multi-core CPU and GPU processors have attracted much attention in accelerating computer ...This paper describes parallel simulation techniques for the discrete element method (DEM) on multi-core processors. Recently, multi-core CPU and GPU processors have attracted much attention in accelerating computer simulations in various fields. We propose a new algorithm for multi-thread parallel computation of DEM, which makes effective use of the available memory and accelerates the computation. This study shows that memory usage is drastically reduced by using this algorithm. To show the practical use of DEM in industry, a large-scale powder system is simulated with a complicated drive unit. We compared the performance of the simulation between the latest GPU and CPU processors with optimized programs for each processor. The results show that the difference in performance is not substantial when using either GPUs or CPUs with a multi-thread parallel algorithm. In addition, DEM algorithm is shown to have high scalabilitv in a multi-thread parallel computation on a CPU.展开更多
This paper presents a method to reduce the energy consumption of multi-core systems characterized by processor cores and buses with discrete frequency levels under timing constraints.The proposed method takes the tran...This paper presents a method to reduce the energy consumption of multi-core systems characterized by processor cores and buses with discrete frequency levels under timing constraints.The proposed method takes the transformations of the original task graphs,which include dependent tasks located in different iterations,as inputs.The proposed method utilizes mapping selection as well as joint processor and communication frequency scaling to implement energy reduction.We conduct experiments on several random task graphs.Experimental results show that the proposed method can achieve substantial energy reduction compared with previous work under the same hard timing constraints.展开更多
Established on the Intel Multi-Core Embedded platform, using 802.11 Wireless Network protocols as the communication medium, combining with Radio Frequency-Communication and Ultrasonic Ranging, imple-ment a mobile term...Established on the Intel Multi-Core Embedded platform, using 802.11 Wireless Network protocols as the communication medium, combining with Radio Frequency-Communication and Ultrasonic Ranging, imple-ment a mobile terminal system in an intellectualized building. It can provide its holder such functions: 1) Accurate Positioning 2) Intelligent Navigation 3) Video Monitoring 4) Wireless Communication. The inno-vative point for this paper is to apply the multi-core computing on the embedded system to promote its com-puting speed and give a real-time performance and apply this system into the indoor environment for the purpose of emergent event or rescuing.展开更多
This research involved an exploratory evaluation of the dynamics of vehicular traffic on a road network across two traffic light-controlled junctions. The study uses the case study of a one-kilometer road system model...This research involved an exploratory evaluation of the dynamics of vehicular traffic on a road network across two traffic light-controlled junctions. The study uses the case study of a one-kilometer road system modelled on Anylogic version 8.8.4. Anylogic is a multi-paradigm simulation tool that supports three main simulation methodologies: discrete event simulation, agent-based modeling, and system dynamics modeling. The system is used to evaluate the implication of stochastic time-based vehicle variables on the general efficiency of road use. Road use efficiency as reflected in this model is based on the percentage of entry vehicles to exit the model within a one-hour simulation period. The study deduced that for the model under review, an increase in entry point time delay has a domineering influence on the efficiency of road use far beyond any other consideration. This study therefore presents a novel approach that leverages Discrete Events Simulation to facilitate efficient road management with a focus on optimum road use efficiency. The study also determined that the inclusion of appropriate random parameters to reflect road use activities at critical event points in a simulation can help in the effective representation of authentic traffic models. The Anylogic simulation software leverages the Classic DEVS and Parallel DEVS formalisms to achieve these objectives.展开更多
The fast Fourier transform (FFT) is a fundamental kernel of many computation-intensive scientific applications.This paper deals with an implementation of the FFT on the accelerator system,a heterogeneous multi-core ar...The fast Fourier transform (FFT) is a fundamental kernel of many computation-intensive scientific applications.This paper deals with an implementation of the FFT on the accelerator system,a heterogeneous multi-core architecture to accelerate computation-intensive parallel computing in scientific and engineering applications.The Engineering and Scientific Computation Accelerator (ESCA) consists of a control unit and a single instruction multiple data (SIMD) processing element (PE) array,in which PEs communicate with each other via a hierarchical two-level network-on-chip (NoC) with high bandwidth and low latency.We exploit the architecture features of ESCA to implement a parallel FFT algorithm efficiently.Experimental results show that both the proposed parallel FFT algorithm and the ESCA architecture are scalable.The 16-bit fixed-point parallel FFT performance of ESCA is compared with a published work to prove the superiority of the mapping algorithm and the hardware architecture.The floating-point parallel FFT performances of ESCA are evaluated and compared with those of the IBM Cell processor and GPU to demonstrate the computing power of the ESCA system for high performance applications.展开更多
A multifrontal code is introduced for the efficient solution of the linear system of equations arising from the analysis of structures. The factorization phase is reduced into a series of interleaved element assembly ...A multifrontal code is introduced for the efficient solution of the linear system of equations arising from the analysis of structures. The factorization phase is reduced into a series of interleaved element assembly and dense matrix operations for which the BLAS3 kernels are used. A similar approach is generalized for the forward and back substitution phases for the efficient solution of structures having multiple load conditions. The program performs all assembly and solution steps in parallel. Examples are presented which demonstrate the code’s performance on single and dual core processor computers.展开更多
Quantitative remote sensing retrieval algorithms help understanding the dynamic aspects of Digital Earth.However,the Big Data and complex models in Digital Earth pose grand challenges for computation infrastructures.I...Quantitative remote sensing retrieval algorithms help understanding the dynamic aspects of Digital Earth.However,the Big Data and complex models in Digital Earth pose grand challenges for computation infrastructures.In this article,taking the aerosol optical depth(AOD)retrieval as a study case,we exploit parallel computing methods for high efficient geophysical parameter retrieval.We present an efficient geocomputation workflow for the AOD calculation from the Moderate Resolution Imaging Spectroradiometer(MODIS)satellite data.According to their individual potential for parallelization,several procedures were adapted and implemented for a successful parallel execution on multicore processors and Graphics Processing Units(GPUs).The benchmarks in this paper validate the high parallel performance of the retrieval workflow with speedups of up to 5.x on a multi-core processor with 8 threads and 43.x on a GPU.To specifically address the time-consuming model retrieval part,hybrid parallel patterns which combine the multicore processor’s and the GPU’s compute power were implemented with static and dynamic workload distributions and evaluated on two systems with different CPU–GPU configurations.It is shown that only the dynamic hybrid implementation leads to a greatly enhanced overall exploitation of the heterogeneous hardware environment in varying circumstances.展开更多
基金Supported by the China Postdoctoral Science Foundation(No.2014M552115)the Fundamental Research Funds for the Central Universities,ChinaUniversity of Geosciences(Wuhan)(No.CUGL140833)the National Key Technology Support Program of China(No.2011BAH06B04)
文摘In order to improve the concurrent access performance of the web-based spatial computing system in cluster,a parallel scheduling strategy based on the multi-core environment is proposed,which includes two levels of parallel processing mechanisms.One is that it can evenly allocate tasks to each server node in the cluster and the other is that it can implement the load balancing inside a server node.Based on the strategy,a new web-based spatial computing model is designed in this paper,in which,a task response ratio calculation method,a request queue buffer mechanism and a thread scheduling strategy are focused on.Experimental results show that the new model can fully use the multi-core computing advantage of each server node in the concurrent access environment and improve the average hits per second,average I/O Hits,CPU utilization and throughput.Using speed-up ratio to analyze the traditional model and the new one,the result shows that the new model has the best performance.The performance of the multi-core server nodes in the cluster is optimized;the resource utilization and the parallel processing capabilities are enhanced.The more CPU cores you have,the higher parallel processing capabilities will be obtained.
文摘The present research attempted a Large-Eddy Simulation (LES) of airflow over a steep, three-dimensional isolated hill by using the latest multi-cores multi-CPUs systems. As a result, it was found that 1) turbulence simulations using approximately 50 million grid points are feasible and 2) the use of this system resulted in the achievement of a high computation speed, which exceeded the speed of parallel computation attained by a single CPU on one of the latest supercomputers. Furthermore, LES was conducted by using the multi-GPUs systems. The results of these simulations revealed the following findings: 1) the multi-GPUs environment which used the NVDIA? Tesla M2090 or the M2075 could simulate turbulence in a model with as many as approximately 50 million grid points. 2) The computation speed achieved by the multi-GPUs environments exceeded that by parallel computation which used four to six CPUs of one of the latest supercomputers.
文摘This paper describes parallel simulation techniques for the discrete element method (DEM) on multi-core processors. Recently, multi-core CPU and GPU processors have attracted much attention in accelerating computer simulations in various fields. We propose a new algorithm for multi-thread parallel computation of DEM, which makes effective use of the available memory and accelerates the computation. This study shows that memory usage is drastically reduced by using this algorithm. To show the practical use of DEM in industry, a large-scale powder system is simulated with a complicated drive unit. We compared the performance of the simulation between the latest GPU and CPU processors with optimized programs for each processor. The results show that the difference in performance is not substantial when using either GPUs or CPUs with a multi-thread parallel algorithm. In addition, DEM algorithm is shown to have high scalabilitv in a multi-thread parallel computation on a CPU.
文摘This paper presents a method to reduce the energy consumption of multi-core systems characterized by processor cores and buses with discrete frequency levels under timing constraints.The proposed method takes the transformations of the original task graphs,which include dependent tasks located in different iterations,as inputs.The proposed method utilizes mapping selection as well as joint processor and communication frequency scaling to implement energy reduction.We conduct experiments on several random task graphs.Experimental results show that the proposed method can achieve substantial energy reduction compared with previous work under the same hard timing constraints.
文摘Established on the Intel Multi-Core Embedded platform, using 802.11 Wireless Network protocols as the communication medium, combining with Radio Frequency-Communication and Ultrasonic Ranging, imple-ment a mobile terminal system in an intellectualized building. It can provide its holder such functions: 1) Accurate Positioning 2) Intelligent Navigation 3) Video Monitoring 4) Wireless Communication. The inno-vative point for this paper is to apply the multi-core computing on the embedded system to promote its com-puting speed and give a real-time performance and apply this system into the indoor environment for the purpose of emergent event or rescuing.
文摘This research involved an exploratory evaluation of the dynamics of vehicular traffic on a road network across two traffic light-controlled junctions. The study uses the case study of a one-kilometer road system modelled on Anylogic version 8.8.4. Anylogic is a multi-paradigm simulation tool that supports three main simulation methodologies: discrete event simulation, agent-based modeling, and system dynamics modeling. The system is used to evaluate the implication of stochastic time-based vehicle variables on the general efficiency of road use. Road use efficiency as reflected in this model is based on the percentage of entry vehicles to exit the model within a one-hour simulation period. The study deduced that for the model under review, an increase in entry point time delay has a domineering influence on the efficiency of road use far beyond any other consideration. This study therefore presents a novel approach that leverages Discrete Events Simulation to facilitate efficient road management with a focus on optimum road use efficiency. The study also determined that the inclusion of appropriate random parameters to reflect road use activities at critical event points in a simulation can help in the effective representation of authentic traffic models. The Anylogic simulation software leverages the Classic DEVS and Parallel DEVS formalisms to achieve these objectives.
基金Project supported by the National Natural Science Foundation of China (Nos.60973035 and 60976027)the Natural Science Foundation of Hubei Province,China (No.2010CDB02705)
文摘The fast Fourier transform (FFT) is a fundamental kernel of many computation-intensive scientific applications.This paper deals with an implementation of the FFT on the accelerator system,a heterogeneous multi-core architecture to accelerate computation-intensive parallel computing in scientific and engineering applications.The Engineering and Scientific Computation Accelerator (ESCA) consists of a control unit and a single instruction multiple data (SIMD) processing element (PE) array,in which PEs communicate with each other via a hierarchical two-level network-on-chip (NoC) with high bandwidth and low latency.We exploit the architecture features of ESCA to implement a parallel FFT algorithm efficiently.Experimental results show that both the proposed parallel FFT algorithm and the ESCA architecture are scalable.The 16-bit fixed-point parallel FFT performance of ESCA is compared with a published work to prove the superiority of the mapping algorithm and the hardware architecture.The floating-point parallel FFT performances of ESCA are evaluated and compared with those of the IBM Cell processor and GPU to demonstrate the computing power of the ESCA system for high performance applications.
文摘A multifrontal code is introduced for the efficient solution of the linear system of equations arising from the analysis of structures. The factorization phase is reduced into a series of interleaved element assembly and dense matrix operations for which the BLAS3 kernels are used. A similar approach is generalized for the forward and back substitution phases for the efficient solution of structures having multiple load conditions. The program performs all assembly and solution steps in parallel. Examples are presented which demonstrate the code’s performance on single and dual core processor computers.
基金This work was supported in part by the National Natural Science Foundation of China(NSFC)under Grant 41271371 and Grant 41471306the Major International Cooperation and Exchange Project of NSFC under Grant 41120114001+2 种基金the Institute of Remote Sensing and Digital Earth Institute,Chinese Academy of Sciences(CAS-RADI)Innovation project under Grants Y3SG0300CXthe graduate foundation of CAS-RADI under Grant Y4ZZ06101Bthe Joint Doctoral Promotion Program hosted by the Fraunhofer Institute and Chinese Academy of Sciences.Many thanks are due to the Fraunhofer Institute for Algorithms and Scientific Computing SCAI for the multi-core and GPU platform used in this paper.
文摘Quantitative remote sensing retrieval algorithms help understanding the dynamic aspects of Digital Earth.However,the Big Data and complex models in Digital Earth pose grand challenges for computation infrastructures.In this article,taking the aerosol optical depth(AOD)retrieval as a study case,we exploit parallel computing methods for high efficient geophysical parameter retrieval.We present an efficient geocomputation workflow for the AOD calculation from the Moderate Resolution Imaging Spectroradiometer(MODIS)satellite data.According to their individual potential for parallelization,several procedures were adapted and implemented for a successful parallel execution on multicore processors and Graphics Processing Units(GPUs).The benchmarks in this paper validate the high parallel performance of the retrieval workflow with speedups of up to 5.x on a multi-core processor with 8 threads and 43.x on a GPU.To specifically address the time-consuming model retrieval part,hybrid parallel patterns which combine the multicore processor’s and the GPU’s compute power were implemented with static and dynamic workload distributions and evaluated on two systems with different CPU–GPU configurations.It is shown that only the dynamic hybrid implementation leads to a greatly enhanced overall exploitation of the heterogeneous hardware environment in varying circumstances.