期刊文献+
共找到7篇文章
< 1 >
每页显示 20 50 100
Performance characterization of illumination algorithms for reconfigurable graphics processor 被引量:2
1
作者 Deng Junyong Liu Yang Xie Xiaoyan 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2019年第5期60-71,共12页
Graphics processing is an increasing important application domain with the demand of real-time rendering,video streaming,virtual reality,and so on.Illumination is a critical module in graphics rendering and is typical... Graphics processing is an increasing important application domain with the demand of real-time rendering,video streaming,virtual reality,and so on.Illumination is a critical module in graphics rendering and is typically compute-bound,memory-bound,and power-bound in different application cases.It is crucial to decide how to schedule different illumination algorithms with different features according to the practical requirements in reconfigurable graphics hardware.This paper analyze the performance characteristics of four main-stream lighting algorithms,Lambert illumination algorithm,Phong illumination algorithm,Blinn-Phong illumination algorithm,and Cook-Torrance illumination algorithm,using hardware performance counters on x86 processor platform KabyLake(KBL).The data movement,computation,power consumption,and memory accessing are evaluated over a range of application scenarios.Further,by analyzing the system-level behavior of these illumination algorithms,obtains the cons and pros of these specific algorithms were obtained.The associated relationship between performance/energy and the evaluated metrics was analyzed through Pearson correlation coefficient(PCC)analysis.According to these performance characterization data,this paper presents some reconfiguration suggestions in reconfigurable graphics processor. 展开更多
关键词 performance characterization illumination algorithms reconfigurable graphics processor correlation analysis computer architecture
原文传递
BFM:A Bus-Like Data Feedback Mechanism Between Graphics Processor and Host CPU
2
作者 邓军勇 蒋林 《Journal of Shanghai Jiaotong university(Science)》 EI 2020年第5期615-622,共8页
Graphics processors have received an increasing attention with the growing demand for gaming,video streaming,and many other applications.During the graphics rendering with OpenGL,host CPU needs the runtime attributes ... Graphics processors have received an increasing attention with the growing demand for gaming,video streaming,and many other applications.During the graphics rendering with OpenGL,host CPU needs the runtime attributes to move on to the next procedure of rendering,which covers almost all the function units of graphics pipeline.Current methods suffer from the memory capacity issues to hold the variables or huge amount of data parsing paths which can cause congestion on the interface between graphics processor and host CPU.This paper refers to the operation principle of commuting bus,and proposes a bus-like data feedback mechanism(BFM)to traverse all the pipeline stages and collect the run-time status data or execution error of graphics rendering,then send them back to the host CPU.BFM can work in parallel with the graphics rendering logic.This method can complete the data feedback ta.sk easily with only 0.6%increase of resource utilization and has no negative impact on performance,which also obtains 1.3 times speed enhancement compared with a traditional approach. 展开更多
关键词 data feedback mechanism run-time attributes OjienGL graphics processor host CPU
原文传递
PPAA:a parallel primitive assembly accelerator in graphics processor
3
作者 Deng Junyong Xie Xiaoyan +1 位作者 Liu Yang Tian Pu 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2020年第2期65-71,共7页
Primitive assembly is an inevitable procedure of graphics rendering which performs the objects preparation for the following steps,however,the conventional approaches suffer from some issues,such as the missing of sur... Primitive assembly is an inevitable procedure of graphics rendering which performs the objects preparation for the following steps,however,the conventional approaches suffer from some issues,such as the missing of surface attribute,mismatch of color mode for clipped primitives,and performance bottleneck of rendering pipeline.This paper takes all these issues into considerations,and proposes a parallel primitive assembly accelerator(PPAA)which can solve not only the functional problems but also improve the shading performance.The register transfer level(RTL)circuit is designed and the detailed approach is presented.The prototype systems are implemented on Xilinx field programmable gate array(FPGA)XC6 VLX550 T and Altera FPGA EP2 C70 F896 C6.The experimental results show that PPAA can accomplish the assembly tasks correctly and with higher performance of 1.5x and 2.5x of two previous implementations.For the most frequently independent primitives,the PPAA can efficiently enhance the throughput by squeezing out the pipeline bubbles and by balancing the pipeline stages. 展开更多
关键词 primitive assembly parallel accelerator primitive characteristics graphics processor
原文传递
Speeding up the MATLAB complex networks package using graphic processors 被引量:1
4
作者 张百达 唐玉华 +1 位作者 吴俊杰 李鑫 《Chinese Physics B》 SCIE EI CAS CSCD 2011年第9期460-467,共8页
The availability of computers and communication networks allows us to gather and analyse data on a far larger scale than previously. At present, it is believed that statistics is a suitable method to analyse networks ... The availability of computers and communication networks allows us to gather and analyse data on a far larger scale than previously. At present, it is believed that statistics is a suitable method to analyse networks with millions, or more, of vertices. The MATLAB language, with its mass of statistical functions, is a good choice to rapidly realize an algorithm prototype of complex networks. The performance of the MATLAB codes can be further improved by using graphic processor units (GPU). This paper presents the strategies and performance of the GPU implementation of a complex networks package, and the Jacket toolbox of MATLAB is used. Compared with some commercially available CPU implementations, GPU can achieve a speedup of, on average, 11.3x. The experimental result proves that the GPU platform combined with the MATLAB language is a good combination for complex network research. 展开更多
关键词 complex networks graphic processors unit MATLAB Jacket Toolbox
下载PDF
Three Dimensional Simulation of Ion Thruster Plume-Spacecraft Interaction Based on a Graphic Processor Unit 被引量:1
5
作者 任军学 李娟 +3 位作者 谢侃 田华兵 仇钎 汤海滨 《Plasma Science and Technology》 SCIE EI CAS CSCD 2013年第7期702-709,共8页
Based on the three-dimensional particle-in-cell (PIC) method and Compute Unified Device Architecture (CUDA), a parallel particle simulation code combined with a graphic processor unit (GPU) has been developed fo... Based on the three-dimensional particle-in-cell (PIC) method and Compute Unified Device Architecture (CUDA), a parallel particle simulation code combined with a graphic processor unit (GPU) has been developed for the simulation of charge-exchange (CEX) xenon ions in the plume of an ion thruster. Using the proposed technique, the potential and CEX plasma distribution are calculated for the ion thruster plume surrounding the DS1 spacecraft at different thrust levels. The simulation results are in good agreement with measured CEX ion parameters reported in literature, and the CPU's results are equal to a CPU's. Compared with a single CPU Intel Core 2 E6300, 16-processor GPU NVIDIA GeForce 9400 GT indicates a speedup factor of 3.6 when the total macro particle number is 1.1 × 10^6. The simulation results also reveal how the back flow CEX plasma affects the spacecraft floating potential, which indicates that the plume of the ion thruster is indeed able to alleviate the extreme negative floating potentials of spacecraft in geosynchronous orbit. 展开更多
关键词 ion thruster particle simulation graphic processor uait PLUME
下载PDF
Comparison of Parallelization Strategies for Min-Sum Decoding of Irregular LDPC Codes 被引量:1
6
作者 Hua Xu Wei Wan +3 位作者 Wei Wang Jun Wang Jiadong Yang Yun Wen 《Tsinghua Science and Technology》 SCIE EI CAS 2013年第6期577-587,共11页
Low-Density Parity-Check (LDPC) codes are powerful error correcting codes. LDPC decoders have been implemented as efficient error correction codes on dedicated VLSI hardware architectures in recent years. This paper... Low-Density Parity-Check (LDPC) codes are powerful error correcting codes. LDPC decoders have been implemented as efficient error correction codes on dedicated VLSI hardware architectures in recent years. This paper describes two strategies to parallelize min-sum decoding of irregular LDPC codes. The first implements min-sum LDPC decoders on multicore platforms using OpenMP, while the other uses the Compute Unified Device Architecture (CUDA) to parallelize LDPC decoding on Graphics Processing Units (GPUs). Empirical studies on data with various scales show that the performance of these decoding processes is improved by these parallel strategies and the GPUs provide more efficient, fast implementation decoder. 展开更多
关键词 Low-Density Parity-Check (LDPC) codes MULTICORE OPENMP Graphic processor Unit (GPU) ComputeUnified Device Architecture (CUDA)
原文传递
A GPU accelerated finite volume coastal ocean model 被引量:1
7
作者 赵旭东 梁书秀 +3 位作者 孙昭晨 赵西增 孙家文 刘忠波 《Journal of Hydrodynamics》 SCIE EI CSCD 2017年第4期679-690,共12页
With the unstructured grid, the Finite Volume Coastal Ocean Model(FVCOM) is converted from its original FORTRAN code to a Compute Unified Device Architecture(CUDA) C code, and optimized on the Graphic Processor U... With the unstructured grid, the Finite Volume Coastal Ocean Model(FVCOM) is converted from its original FORTRAN code to a Compute Unified Device Architecture(CUDA) C code, and optimized on the Graphic Processor Unit(GPU). The proposed GPU-FVCOM is tested against analytical solutions for two standard cases in a rectangular basin, a tide induced flow and a wind induced circulation. It is then applied to the Ningbo's coastal water area to simulate the tidal motion and analyze the flow field and the vertical tide velocity structure. The simulation results agree with the measured data quite well. The accelerated performance of the proposed 3-D model reaches 30 times of that of a single thread program, and the GPU-FVCOM implemented on a Tesla k20 device is faster than on a workstation with 20 CPU cores, which shows that the GPU-FVCOM is efficient for solving large scale sea area and high resolution engineering problems. 展开更多
关键词 Graphic processor Unit(GPU) 3-D ocean model unstructured grid finite volume coastal ocean model(FVCOM)
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部