The most popular hardware used for parallel depth migration is the PC-Cluster but its application is limited due to large space occupation and high power consumption. In this paper, we introduce a new hardware archite...The most popular hardware used for parallel depth migration is the PC-Cluster but its application is limited due to large space occupation and high power consumption. In this paper, we introduce a new hardware architecture, based on which the finite difference (FD) wavefield-continuation depth migration can be conducted using the Graphics Processing Unit (GPU) as a CPU coprocessor. We demonstrate the program module and three key optimization steps for implementing FD depth migration: memory, thread structure, and instruction optimizations and consider evaluation methods for the amount of optimization. 2D and 3D models are used to test depth migration on the GPU. The tested results show that the depth migration computational efficiency greatly increased using the general-purpose GPU, increasing by at least 25 times compared to the AMD 2.5 GHz CPU.展开更多
Particle-in-cell (PIC) method has got much benefits from GPU-accelerated heterogeneous systems.However,the performance of PIC is constrained by the interpolation operations in the weighting process on GPU (graphic pro...Particle-in-cell (PIC) method has got much benefits from GPU-accelerated heterogeneous systems.However,the performance of PIC is constrained by the interpolation operations in the weighting process on GPU (graphic processing unit).Aiming at this problem,a fast weighting method for PIC simulation on GPU-accelerated systems was proposed to avoid the atomic memory operations during the weighting process.The method was implemented by taking advantage of GPU's thread synchronization mechanism and dividing the problem space properly.Moreover,software managed shared memory on the GPU was employed to buffer the intermediate data.The experimental results show that the method achieves speedups up to 3.5 times compared to previous works,and runs 20.08 times faster on one NVIDIA Tesla M2090 GPU compared to a single core of Intel Xeon X5670 CPU.展开更多
The design, analysis and parallel implementation of particle filter(PF) were investigated. Firstly, to tackle the particle degeneracy problem in the PF, an iterated importance density function(IIDF) was proposed, wher...The design, analysis and parallel implementation of particle filter(PF) were investigated. Firstly, to tackle the particle degeneracy problem in the PF, an iterated importance density function(IIDF) was proposed, where a new term associating with the current measurement information(CMI) was introduced into the expression of the sampled particles. Through the repeated use of the least squares estimate, the CMI can be integrated into the sampling stage in an iterative manner, conducing to the greatly improved sampling quality. By running the IIDF, an iterated PF(IPF) can be obtained. Subsequently, a parallel resampling(PR) was proposed for the purpose of parallel implementation of IPF, whose main idea was the same as systematic resampling(SR) but performed differently. The PR directly used the integral part of the product of the particle weight and particle number as the number of times that a particle was replicated, and it simultaneously eliminated the particles with the smallest weights, which are the two key differences from the SR. The detailed implementation procedures on the graphics processing unit of IPF based on the PR were presented at last. The performance of the IPF, PR and their parallel implementations are illustrated via one-dimensional numerical simulation and practical application of passive radar target tracking.展开更多
The method to measure the refractive index of the liquid-state metal is introduced. By inserting a wedge sample cell to the optical path of the Michelson interferometer, the refractive index of the liquid-state alloy ...The method to measure the refractive index of the liquid-state metal is introduced. By inserting a wedge sample cell to the optical path of the Michelson interferometer, the refractive index of the liquid-state alloy SCN-Eth is measured at different temperatures and densities. The results show that this method can be applied to measure the refractive index of various liquids, especially at different temperatures.展开更多
Mutual information (MI)-based image registration is effective in registering medical images, but it is computationally expensive. This paper accelerates MI-based image registration by dividing computation of mutual ...Mutual information (MI)-based image registration is effective in registering medical images, but it is computationally expensive. This paper accelerates MI-based image registration by dividing computation of mutual information into spatial transformation and histogram-based calculation, and performing 3D spatial transformation and trilinear interpolation on graphic processing unit (GPU). The 3D floating image is downloaded to GPU as flat 3D texture, and then fetched and interpolated for each new voxel location in fragment shader. The transformed resuits are rendered to textures by using frame buffer object (FBO) extension, and then read to the main memory used for the remaining computation on CPU. Experimental results show that GPU-accelerated method can achieve speedup about an order of magnitude with better registration result compared with the software implementation on a single-core CPU.展开更多
This paper presents an optimization of shadow volume algorithm, which allow a rendering in real-time. This technique is based on previous works which makes it possible to obtain shadows in real-time, although the calc...This paper presents an optimization of shadow volume algorithm, which allow a rendering in real-time. This technique is based on previous works which makes it possible to obtain shadows in real-time, although the calculation of the silhouette requires a pretreatment of the geometry implemented on the CPU (Central Processing Unit). By using last version of the GPU (Graphic Processing Unit), the authors propose to implement the calculation of the silhouette on the GPU by using Geometry Shader. The authors present the step which made it possible to lead to a concrete implementation of this algorithm, the modifications which were made, as well as a comparative study of results, followed by a discussion of these results and choices of implementation.展开更多
The paper presents the implementation of a parallel version of FDK (Felkamp, David e Kress) algorithm using graphics processing units. Discussion was briefly some elements the computed tomographic scan and FDK algor...The paper presents the implementation of a parallel version of FDK (Felkamp, David e Kress) algorithm using graphics processing units. Discussion was briefly some elements the computed tomographic scan and FDK algorithm; and some ideas about GPUs (Graphics Processing Units) and its use in general purpose computing were presented. The paper shows a computational implementation of FDK algorithm and the process of parallelization of this implementation. Compare the parallel version of the algorithm with the sequential version, used speedup as a performance metric. To evaluate the performance of parallel version, two GPUs, GeForce 9400GT (16 cores) a low capacity GPU and Quadro 2000 (192 cores) a medium capacity GPU was reached speedup of 3.37.展开更多
It was successfully synthesized liquid crystal monomer acrylate that conjugated with two mesogens were cholesterol and p-hydroxyphenyl-2-methyl Butanoat which called MA (monomer cholesteryl acrylate) and monomer (S...It was successfully synthesized liquid crystal monomer acrylate that conjugated with two mesogens were cholesterol and p-hydroxyphenyl-2-methyl Butanoat which called MA (monomer cholesteryl acrylate) and monomer (S)-(+)-4-(2-Methyl butanoat-l-butyloxy) phenyl 4-[1-(propenoyloxy) butyloxy] benzoate (MB). Two monomers were characterized by DSC (differential scanning calorimetry), POM (polarization optical microscopy) and XRD (X-ray diffraction). Mesophase temperatures of MA and MB are 81.28 ~C and 54.36~C, respectively. Textures analysis by POM shows that MA was oily streak and MB was schlieren. XRD pattern shows the strongest three peaks of MA at room temperature which are (20, deg): 2.7153, 5.2992 and 18.8500. The Strongest three peaks of MB at room temperature are (20, deg): 9.1726, 9.7707 and 12.5389. XRD pattern of MA and MB at mesophase and above mesophase temperature that each peaks disappear.展开更多
This paper presents a new graph-based single-copy routmg method m delay tolerant networks (DTN). With time goes on in the networks, a DTN connectivity graph is constituted with mobility of nodes and communication, a...This paper presents a new graph-based single-copy routmg method m delay tolerant networks (DTN). With time goes on in the networks, a DTN connectivity graph is constituted with mobility of nodes and communication, and a corresponding greedy tree is obtained using a greedy algorithm in DTN connectivity graph. While there are some bad nodes such as disabled nodes or selfish nodes in delay tolerant networks, the nodes can choose the next p^oper intermediate node to transmit the mes- sage by comparing the location of neighboring nodes in the greedy tree. The single-copy routing method is very appropriate for energy-constrained, storage-constrained and bandwidth-constrained applications such as mobile wireless DTN networks. We show that delivery ratio is increased significantly by using the graph-based single-copy routing when bad nodes exist.展开更多
The phase field simulation has been actively studied as a powerful method to investigate the microstructural evolution during the solidification.However,it is a great challenge to perform the phase field simulation in...The phase field simulation has been actively studied as a powerful method to investigate the microstructural evolution during the solidification.However,it is a great challenge to perform the phase field simulation in large length and time scale.The developed graphics processing unit(GPU)calculation is used in the phase filed simulation,greatly accelerating the calculation efficiency.The results show that the computation with GPU is about 36 times faster than that with a single Central Processing Unit(CPU)core.It provides the feasibility of the GPU-accelerated phase field simulation on a desktop computer.The GPU-accelerated strategy will bring a new opportunity to the application of phase field simulation.展开更多
An easy-to-implement yet practical single-camera microscopic stereo-digital image correlation(stereo-DIC) technique is proposed for surface three-dimensional(3D) deformation measurement of singe lap joint(SLJ) samples...An easy-to-implement yet practical single-camera microscopic stereo-digital image correlation(stereo-DIC) technique is proposed for surface three-dimensional(3D) deformation measurement of singe lap joint(SLJ) samples subjected to mechanical loads. The basic principles, optical configurations and implementation procedures of the proposed technique are described in detail. Compared with existing single-camera 2D-DIC technique, which has been regularly used for in-plane deformation measurement of a SLJ specimen, the proposed technique offers the special merit of simultaneously determining all the three displacement components by simply adding two additional optical elements to existing single-camera 2D-DIC systems. The accuracy and effectiveness of the proposed technique is demonstrated by measuring the 3D deformation of a SLJ specimen subjected to quasi-static tensile loads.展开更多
基金supported by the National Natural Science Foundation of China (Nos. 41104083 and 40804024) Fundamental Research Funds for the Central Universities (No, 2011YYL022)
文摘The most popular hardware used for parallel depth migration is the PC-Cluster but its application is limited due to large space occupation and high power consumption. In this paper, we introduce a new hardware architecture, based on which the finite difference (FD) wavefield-continuation depth migration can be conducted using the Graphics Processing Unit (GPU) as a CPU coprocessor. We demonstrate the program module and three key optimization steps for implementing FD depth migration: memory, thread structure, and instruction optimizations and consider evaluation methods for the amount of optimization. 2D and 3D models are used to test depth migration on the GPU. The tested results show that the depth migration computational efficiency greatly increased using the general-purpose GPU, increasing by at least 25 times compared to the AMD 2.5 GHz CPU.
基金Projects(61170049,60903044)supported by National Natural Science Foundation of ChinaProject(2012AA010903)supported by National High Technology Research and Development Program of China
文摘Particle-in-cell (PIC) method has got much benefits from GPU-accelerated heterogeneous systems.However,the performance of PIC is constrained by the interpolation operations in the weighting process on GPU (graphic processing unit).Aiming at this problem,a fast weighting method for PIC simulation on GPU-accelerated systems was proposed to avoid the atomic memory operations during the weighting process.The method was implemented by taking advantage of GPU's thread synchronization mechanism and dividing the problem space properly.Moreover,software managed shared memory on the GPU was employed to buffer the intermediate data.The experimental results show that the method achieves speedups up to 3.5 times compared to previous works,and runs 20.08 times faster on one NVIDIA Tesla M2090 GPU compared to a single core of Intel Xeon X5670 CPU.
基金Project(61372136) supported by the National Natural Science Foundation of China
文摘The design, analysis and parallel implementation of particle filter(PF) were investigated. Firstly, to tackle the particle degeneracy problem in the PF, an iterated importance density function(IIDF) was proposed, where a new term associating with the current measurement information(CMI) was introduced into the expression of the sampled particles. Through the repeated use of the least squares estimate, the CMI can be integrated into the sampling stage in an iterative manner, conducing to the greatly improved sampling quality. By running the IIDF, an iterated PF(IPF) can be obtained. Subsequently, a parallel resampling(PR) was proposed for the purpose of parallel implementation of IPF, whose main idea was the same as systematic resampling(SR) but performed differently. The PR directly used the integral part of the product of the particle weight and particle number as the number of times that a particle was replicated, and it simultaneously eliminated the particles with the smallest weights, which are the two key differences from the SR. The detailed implementation procedures on the graphics processing unit of IPF based on the PR were presented at last. The performance of the IPF, PR and their parallel implementations are illustrated via one-dimensional numerical simulation and practical application of passive radar target tracking.
文摘The method to measure the refractive index of the liquid-state metal is introduced. By inserting a wedge sample cell to the optical path of the Michelson interferometer, the refractive index of the liquid-state alloy SCN-Eth is measured at different temperatures and densities. The results show that this method can be applied to measure the refractive index of various liquids, especially at different temperatures.
基金Supported by National High Technology Research and Development Program("863"Program)of China(No.863-306-ZD13-03-06)
文摘Mutual information (MI)-based image registration is effective in registering medical images, but it is computationally expensive. This paper accelerates MI-based image registration by dividing computation of mutual information into spatial transformation and histogram-based calculation, and performing 3D spatial transformation and trilinear interpolation on graphic processing unit (GPU). The 3D floating image is downloaded to GPU as flat 3D texture, and then fetched and interpolated for each new voxel location in fragment shader. The transformed resuits are rendered to textures by using frame buffer object (FBO) extension, and then read to the main memory used for the remaining computation on CPU. Experimental results show that GPU-accelerated method can achieve speedup about an order of magnitude with better registration result compared with the software implementation on a single-core CPU.
文摘This paper presents an optimization of shadow volume algorithm, which allow a rendering in real-time. This technique is based on previous works which makes it possible to obtain shadows in real-time, although the calculation of the silhouette requires a pretreatment of the geometry implemented on the CPU (Central Processing Unit). By using last version of the GPU (Graphic Processing Unit), the authors propose to implement the calculation of the silhouette on the GPU by using Geometry Shader. The authors present the step which made it possible to lead to a concrete implementation of this algorithm, the modifications which were made, as well as a comparative study of results, followed by a discussion of these results and choices of implementation.
文摘The paper presents the implementation of a parallel version of FDK (Felkamp, David e Kress) algorithm using graphics processing units. Discussion was briefly some elements the computed tomographic scan and FDK algorithm; and some ideas about GPUs (Graphics Processing Units) and its use in general purpose computing were presented. The paper shows a computational implementation of FDK algorithm and the process of parallelization of this implementation. Compare the parallel version of the algorithm with the sequential version, used speedup as a performance metric. To evaluate the performance of parallel version, two GPUs, GeForce 9400GT (16 cores) a low capacity GPU and Quadro 2000 (192 cores) a medium capacity GPU was reached speedup of 3.37.
文摘It was successfully synthesized liquid crystal monomer acrylate that conjugated with two mesogens were cholesterol and p-hydroxyphenyl-2-methyl Butanoat which called MA (monomer cholesteryl acrylate) and monomer (S)-(+)-4-(2-Methyl butanoat-l-butyloxy) phenyl 4-[1-(propenoyloxy) butyloxy] benzoate (MB). Two monomers were characterized by DSC (differential scanning calorimetry), POM (polarization optical microscopy) and XRD (X-ray diffraction). Mesophase temperatures of MA and MB are 81.28 ~C and 54.36~C, respectively. Textures analysis by POM shows that MA was oily streak and MB was schlieren. XRD pattern shows the strongest three peaks of MA at room temperature which are (20, deg): 2.7153, 5.2992 and 18.8500. The Strongest three peaks of MB at room temperature are (20, deg): 9.1726, 9.7707 and 12.5389. XRD pattern of MA and MB at mesophase and above mesophase temperature that each peaks disappear.
基金Supported by the National High Technology Research and Development Programme of China (No. 2007AA01Z429, 2007AA01Z405 ) and the National Natural Science Foundation of China (No. 60633020, 60702059, 60872041 ).
文摘This paper presents a new graph-based single-copy routmg method m delay tolerant networks (DTN). With time goes on in the networks, a DTN connectivity graph is constituted with mobility of nodes and communication, and a corresponding greedy tree is obtained using a greedy algorithm in DTN connectivity graph. While there are some bad nodes such as disabled nodes or selfish nodes in delay tolerant networks, the nodes can choose the next p^oper intermediate node to transmit the mes- sage by comparing the location of neighboring nodes in the greedy tree. The single-copy routing method is very appropriate for energy-constrained, storage-constrained and bandwidth-constrained applications such as mobile wireless DTN networks. We show that delivery ratio is increased significantly by using the graph-based single-copy routing when bad nodes exist.
基金supported by the China Postdoctoral Science Foundation(Grant No.2013M540772)the Young Scientists Fund of the National Natural Science Foundation of China(Grant Nos.61203233,51101124,51101125)
文摘The phase field simulation has been actively studied as a powerful method to investigate the microstructural evolution during the solidification.However,it is a great challenge to perform the phase field simulation in large length and time scale.The developed graphics processing unit(GPU)calculation is used in the phase filed simulation,greatly accelerating the calculation efficiency.The results show that the computation with GPU is about 36 times faster than that with a single Central Processing Unit(CPU)core.It provides the feasibility of the GPU-accelerated phase field simulation on a desktop computer.The GPU-accelerated strategy will bring a new opportunity to the application of phase field simulation.
基金supported by the National Natural Science Foundation of China(Grant Nos.1127203211322220 and 11427802)+2 种基金the Program for New Century Excellent Talents in University(Grant No.NCET-12-0023)the Science Fund of State Key Laboratory of Automotive Safety and Energy(Grant No.KF14032)Beijing Nova Program(Grant No.xx2014B034)
文摘An easy-to-implement yet practical single-camera microscopic stereo-digital image correlation(stereo-DIC) technique is proposed for surface three-dimensional(3D) deformation measurement of singe lap joint(SLJ) samples subjected to mechanical loads. The basic principles, optical configurations and implementation procedures of the proposed technique are described in detail. Compared with existing single-camera 2D-DIC technique, which has been regularly used for in-plane deformation measurement of a SLJ specimen, the proposed technique offers the special merit of simultaneously determining all the three displacement components by simply adding two additional optical elements to existing single-camera 2D-DIC systems. The accuracy and effectiveness of the proposed technique is demonstrated by measuring the 3D deformation of a SLJ specimen subjected to quasi-static tensile loads.