期刊文献+
共找到14篇文章
< 1 >
每页显示 20 50 100
动态任务分配CUDA线程束步进体绘制 被引量:4
1
作者 孙万捷 高瞻 +2 位作者 潘海燕 王杰华 蒋峥峥 《计算机辅助设计与图形学学报》 EI CSCD 北大核心 2016年第10期1630-1638,共9页
针对标准CUDA光线投射体绘制过程中因线程束内线程计算量不均产生线程束分化,导致计算资源利用率低的问题,提出CUDA线程束步进的算法.首先分析标准CUDA实现导致线程束分化的原因,提出将光线积分映射至线程束上,线程束内所有线程同步分... 针对标准CUDA光线投射体绘制过程中因线程束内线程计算量不均产生线程束分化,导致计算资源利用率低的问题,提出CUDA线程束步进的算法.首先分析标准CUDA实现导致线程束分化的原因,提出将光线积分映射至线程束上,线程束内所有线程同步分段积分直至光线终止,以避免线程束分化;然后结合光线积分的数学原理和GPU的硬件特性提出线程束内光线积分的算法;最后针对静态线程束任务分配方式导致负载失衡的缺点,提出动态线程束任务分配的实现算法.实验结果表明,动态任务分配线程束步进算法的性能较标准CUDA实现可获得1.9~7.9倍的加速效果. 展开更多
关键词 CUDA 线程束 体绘制 资源利用率
下载PDF
基于自适应线程束的GPU并行粒子群优化算法 被引量:2
2
作者 张硕 何发智 +1 位作者 周毅 鄢小虎 《计算机应用》 CSCD 北大核心 2016年第12期3274-3279,共6页
基于统一计算设备架构(CUDA)对图形处理器(GPU)下的并行粒子群优化(PSO)算法作改进研究。根据CUDA的硬件体系结构特点,可知Block是串行执行的,线程束(Warp)才是流多处理器(SM)调度和执行的基本单位。为了充分利用Block中线程的并行性,... 基于统一计算设备架构(CUDA)对图形处理器(GPU)下的并行粒子群优化(PSO)算法作改进研究。根据CUDA的硬件体系结构特点,可知Block是串行执行的,线程束(Warp)才是流多处理器(SM)调度和执行的基本单位。为了充分利用Block中线程的并行性,提出基于自适应线程束的GPU并行PSO算法:将粒子的维度和线程相对应;利用GPU的Warp级并行,根据维度的不同自适应地将每个粒子与一个或多个Warp相对应;自适应地将一个或多个粒子与每个Block相对应。与已有的粗粒度并行方法(将每个粒子和线程相对应)以及细粒度并行方法(将每个粒子和Block相对应)进行了对比分析,实验结果表明,所提出的并行方法相对前两种并行方法,CPU加速比最多提高了40。 展开更多
关键词 粒子群优化算法 并行计算 图形处理器 统一计算设备架构 自适应线程束
下载PDF
面向DCU非一致控制流的编译优化 被引量:2
3
作者 杨小艺 赵荣彩 +2 位作者 王洪生 韩林 徐坤坤 《计算机应用》 CSCD 北大核心 2023年第10期3170-3177,共8页
国产DCU采用单指令多线程(SIMT)的并行执行模型,在程序执行时核函数内会产生非一致控制流,导致线程束中的线程部分只能串行执行,即线程束分化。针对核函数的性能因线程束分化受到严重制约的问题,提出一种减少线程束分化时间的编译优化... 国产DCU采用单指令多线程(SIMT)的并行执行模型,在程序执行时核函数内会产生非一致控制流,导致线程束中的线程部分只能串行执行,即线程束分化。针对核函数的性能因线程束分化受到严重制约的问题,提出一种减少线程束分化时间的编译优化方法——部分控制流合并(PCFM)。首先,通过散度分析找到同构且含有大量相同指令和相似指令的可融合发散区域;其次,统计合并后节省的指令周期百分比,从而评估可融合发散区域的融合盈利;最后,查找对齐序列,并合并有收益的可融合发散区域。在DCU上使用PCFM测试从图形处理器(GPU)基准测试套件Rodinia和经典的排序算法中选择的测试用例,实验结果表明,PCFM对测试用例能够取得1.146的平均加速比,与分支融合+尾合并方法相比,使用PCFM的加速比平均提高了5.72%。可见,所提方法减少线程束分化的效果更好。 展开更多
关键词 DCU 单指令多线程 线程束分化 复杂控制流 编译优化
下载PDF
一种针对栅栏同步的GPGPU微架构优化设计
4
作者 贾世伟 张玉明 +1 位作者 田泽 秦翔 《固体电子学研究与进展》 CAS 北大核心 2023年第1期70-77,共8页
为了降低通用图形处理器(GPGPU)中栅栏同步开销对程序性能产生的不良影响,提出了一种GPGPU微架构优化设计。该设计在线程束调度模块中,根据栅栏同步开销决定各线程束的调度顺序,确保高栅栏同步开销的线程束能够优先调度执行。在一级数... 为了降低通用图形处理器(GPGPU)中栅栏同步开销对程序性能产生的不良影响,提出了一种GPGPU微架构优化设计。该设计在线程束调度模块中,根据栅栏同步开销决定各线程束的调度顺序,确保高栅栏同步开销的线程束能够优先调度执行。在一级数据缓存模块中,结合数据缓存缺失率与栅栏同步状态来共同决定各访存请求是否需要执行旁路操作,由此在不损害数据局域性开发的前提下,降低数据缓存阻塞周期对栅栏同步产生的影响。两种子模块优化设计均能够降低栅栏同步开销。实验结果表明,相比基准GPGPU架构与当前现有的栅栏同步优化策略,本设计在栅栏同步密集类程序中分别带来了4.15%、4.13%与2.62%的每周期指令数提升,证明了优化设计的有效性与实用性。 展开更多
关键词 通用图形处理器 栅栏同步 线程束调度 一级数据缓存 缓存旁路 性能
下载PDF
面向CUDA程序的性能预测框架
5
作者 曲海成 于思淼 +1 位作者 刘万军 王鑫源 《电子学报》 EI CAS CSCD 北大核心 2020年第4期654-661,共8页
为对CUDA并行程序内核性能进行分析和预测,从而指导并行程序设计及性能优化,提出一种性能预测框架.1)从GPU编程模型和设备架构细节入手,以线程束为研究单位,通过整合与GPU程序用时密切相关的软硬件基本特征,定义了并行空间闲置度、流处... 为对CUDA并行程序内核性能进行分析和预测,从而指导并行程序设计及性能优化,提出一种性能预测框架.1)从GPU编程模型和设备架构细节入手,以线程束为研究单位,通过整合与GPU程序用时密切相关的软硬件基本特征,定义了并行空间闲置度、流处理器线程束负载、并行效应因子等高层次性能相关特征.2)基于上述特征,框架针对线程负载均衡型GPU程序,评估内核函数在不同问题规模以及执行配置下的执行时间.3)依据性能评估原理提出了内核函数执行配置参数的优化策略.验证实验结果表明,该框架在两种典型情境下对现有程序性能的平均预测准确率分别达到89%和94%,客观归纳了高层次特征与程序性能间的相关关系,且能定性分析并行算法性能水平. 展开更多
关键词 性能预测 线程束 设备并行空间 并行效应 性能特征 执行配置参数优化
下载PDF
一种基于Inter-warp异构性的缓存管理与内存调度机制
6
作者 方娟 魏泽琳 于婷雯 《计算机工程与科学》 CSCD 北大核心 2019年第5期788-795,共8页
在GPU中,一个warp内的所有线程在锁步中执行相同的指令。某些线程的内存请求可以得到快速处理,而其余请求会经历较长时间。在最慢的请求完成之前,warp不能执行下一条指令,导致内存发散。对GPU中warp间的异构性进行了研究,实现并优化了... 在GPU中,一个warp内的所有线程在锁步中执行相同的指令。某些线程的内存请求可以得到快速处理,而其余请求会经历较长时间。在最慢的请求完成之前,warp不能执行下一条指令,导致内存发散。对GPU中warp间的异构性进行了研究,实现并优化了一种基于inter-warp异构性的缓存管理机制和内存调度策略,以减少内存发散和缓存排队延迟的负面影响。根据缓存命中率将warp分类,以驱动后面的3个组件:(1)基于warp类型的缓存旁路技术组件,使低缓存利用率的warp进入旁路,不访问L2缓存;(2)基于warp类型的缓存插入/提升策略组件,防止来自高缓存利用率warp的数据被过早清除;(3)基于warp类型的内存控制器组件,优先处理从高缓存利用率的warp接收到的请求,并优先处理来自相同warp的请求。基于warp间异构性的缓存管理和内存调度机制在8种不同的GPGPU应用中,与基准GPU相比,平均加速18.0%。 展开更多
关键词 缓存管理 内存调度 内存发散 线程束
下载PDF
Non-Metric CCD Camera Calibration Algorithm in a Digital Photogrammetry System 被引量:4
7
作者 YANG Hua-chao DENG Ka-zhong ZHANG Shu-bi GUO Guang-li ZHOU Ming 《Journal of China University of Mining and Technology》 EI 2006年第2期119-122,共4页
Camera calibration is a critical process in photogrammetry and a necessary step to acquire 3D information from a 2D image. In this paper, a flexible approach for CCD camera calibration using 2D direct linear transform... Camera calibration is a critical process in photogrammetry and a necessary step to acquire 3D information from a 2D image. In this paper, a flexible approach for CCD camera calibration using 2D direct linear transformation (DLT) and bundle adjustment is proposed. The proposed approach assumes that the camera interior orientation elements are known, and addresses a new closed form solution in planar object space based on homogenous coordinate representation and matrix factorization. Homogeneous coordinate representation offers a direct matrix correspondence between the parameters of the 2D DLT and the collinearity equation. The matrix factorization starts by recovering the elements of the rotation matrix and then solving for the camera position with the collinearity equation. Camera calibration with high precision is addressed by bundle adjustment using the initial values of the camera orientation elements. The results show that the calibration precision of principal point and focal length is about 0.2 and 0.3 pixels respectivelv, which can meet the requirements of close-range photogrammetry with high accuracy. 展开更多
关键词 direct linear transformation collinearity equation bundle adjustment camera calibration Hough transformation
下载PDF
Thermal Energy of Confined Gravitons can Vary Cold Geodesic Curves
8
作者 Igor E. Bulyzhenkov 《Journal of Physical Science and Application》 2014年第7期468-474,共7页
Internal energy of real warm bodies can change their kinetic-potential energy balance on Keplerian orbits and relativistic geodesic. Chiral nature of the mass results in chirality of gravitons and their energy confine... Internal energy of real warm bodies can change their kinetic-potential energy balance on Keplerian orbits and relativistic geodesic. Chiral nature of the mass results in chirality of gravitons and their energy confinement within the constant energy charge of a moving thermodynamical body. Zero energy-momentum gravitons provide dissipative self-heating and spiral fall of massive stars on gravitating centers. Computed self-heating of the pulsar PSR B1913+16 quantitatively describes its period decay without an outward emission of metric waves in question. Deviation of warm bodies from geodesic trajectories of cold point matter complies with Einstein's directives toward pure field physics of material space plenum without metric singularities. 展开更多
关键词 Zero particles chiral graviton SELF-HEATING thermal time evolution.
下载PDF
Integrable Deformations of Heisenberg Supermagnetic Model
9
作者 颜昭雯 李民丽 +1 位作者 吴可 赵伟忠 《Communications in Theoretical Physics》 SCIE CAS CSCD 2010年第1期21-24,共4页
We construct the integrable deformations of the Heisenberg supermagnet model with the quadratic constraints (i) S2=3S - 2I, for S ∈ USPL(2/1)/S(U(2)×U(1)) and (ii) S2=S, for S ∈ USPL(2/1)/S(L(1/... We construct the integrable deformations of the Heisenberg supermagnet model with the quadratic constraints (i) S2=3S - 2I, for S ∈ USPL(2/1)/S(U(2)×U(1)) and (ii) S2=S, for S ∈ USPL(2/1)/S(L(1/1)×U(1)). Under the gauge transformation, their corresponding gauge equivalent counterparts are derived. They are the Grassman odd and super mixed derivative nonlinear Schrodinger equation, respectively. 展开更多
关键词 Heisenberg supermagnet model SUPERSYMMETRY integrable equation
下载PDF
A NEW SOLUTION MODEL OF NONLINEAR DYNAMIC LEAST SQUARE ADJUSTMENT
10
作者 陶华学 郭金运 《Journal of Coal Science & Engineering(China)》 2000年第2期47-51,共5页
The nonlinear least square adjustment is a head object studied in technology fields. The paper studies on the non derivative solution to the nonlinear dynamic least square adjustment and puts forward a new algorithm m... The nonlinear least square adjustment is a head object studied in technology fields. The paper studies on the non derivative solution to the nonlinear dynamic least square adjustment and puts forward a new algorithm model and its solution model. The method has little calculation load and is simple. This opens up a theoretical method to solve the linear dynamic least square adjustment. 展开更多
关键词 nonlinear least square dynamic adjustment non derivative analytic method
全文增补中
A proximal point algorithm revisit on the alternating direction method of multipliers 被引量:23
11
作者 CAI XingJu GU GuoYong +1 位作者 HE BingSheng YUAN XiaoMing 《Science China Mathematics》 SCIE 2013年第10期2179-2186,共8页
The alternating direction method of multipliers(ADMM)is a benchmark for solving convex programming problems with separable objective functions and linear constraints.In the literature it has been illustrated as an app... The alternating direction method of multipliers(ADMM)is a benchmark for solving convex programming problems with separable objective functions and linear constraints.In the literature it has been illustrated as an application of the proximal point algorithm(PPA)to the dual problem of the model under consideration.This paper shows that ADMM can also be regarded as an application of PPA to the primal model with a customized choice of the proximal parameter.This primal illustration of ADMM is thus complemental to its dual illustration in the literature.This PPA revisit on ADMM from the primal perspective also enables us to recover the generalized ADMM proposed by Eckstein and Bertsekas easily.A worst-case O(1/t)convergence rate in ergodic sense is established for a slight extension of Eckstein and Bertsekas’s generalized ADMM. 展开更多
关键词 alternating direction method of multipliers convergence rate convex programming proximalpoint algorithm
原文传递
Nonlinear Interaction of Elliptical Laser Beam with Collisional Plasma:Effect of Linear Absorption 被引量:1
12
作者 keshav walia sarabjit kaur 《Communications in Theoretical Physics》 SCIE CAS CSCD 2016年第1期78-82,共5页
In the present work,nonlinear interaction of elliptical laser beam with collisional plasma is studied by using paraxial ray approximation.Nonlinear differential equations for the beam width parameters of semi-major ax... In the present work,nonlinear interaction of elliptical laser beam with collisional plasma is studied by using paraxial ray approximation.Nonlinear differential equations for the beam width parameters of semi-major axis and semi-minor axis of elliptical laser beam have been set up and solved numerically to study the variation of beam width parameters with normalized distance of propagation.Effects of variation in absorption coefficient and plasma density on the beam width parameters are also analyzed.It is observed from the analysis that extent of self-focusing of beam increases with increase/decrease in plasma density/absorption coefficient. 展开更多
关键词 elliptical laser beam paraxial ray approximation absorption coefficient plasma density
原文传递
Investigation of Heat Transfer of Tube Line of Staggered Tube Bank in TwoPhase Flow
13
作者 Mindaugas Jakubcionis 《Journal of Thermal Science》 SCIE EI CAS CSCD 2015年第3期269-274,共6页
This article presents the results of experimental investigation of heat transfer process, carded out using the model of heat exchanger. Two-phase statically stable foam flow was used as a heat transfer fluid. Heat exc... This article presents the results of experimental investigation of heat transfer process, carded out using the model of heat exchanger. Two-phase statically stable foam flow was used as a heat transfer fluid. Heat exchanger model consisted of staggered tube bank. Experimental results are presented with the focus on influence of tube position in the line of the bank, volumetric void component and velocity of gas component of the foam. The phenomena of liquid draining in cellular foam flow and its influence on heat transfer rate has also been discussed. The experi- mental results have been generalized by relationship between Nusselt, Reynolds and Prandtl numbers. 展开更多
关键词 two phase flow heat transfer heat exchanger
原文传递
Propagation of Lorentz–Gaussian Beams in Strongly Nonlocal Nonlinear Media
14
作者 A.Keshavarz G.Honarasa 《Communications in Theoretical Physics》 SCIE CAS CSCD 2014年第2期241-245,共5页
In this paper the propagation of Lorentz–Gaussian beams in strongly nonlinear nonlocal media is investigated by the ABCD matrix method. For this purpose, an expression for field distribution during propagation is der... In this paper the propagation of Lorentz–Gaussian beams in strongly nonlinear nonlocal media is investigated by the ABCD matrix method. For this purpose, an expression for field distribution during propagation is derived and based on it, the propagation of Lorentz–Gaussian beams is simulated in this media. Then, the evolutions of beam width and curvature radius during propagation are discussed. 展开更多
关键词 nonlocal nonlinear media Lorentz-Gaussian beam ABCD matrix
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部