在构建CPU(Central Process Unit,CPU)与GPU(Graphic Process Unit)或者CPU与其它设备协同计算的过程中,通过PCI(Peripheral Component Interconnect)总线将GPU等其他设备连接至CPU,承担并行计算的任务.为了解决PCI接口芯片与GPU芯片之...在构建CPU(Central Process Unit,CPU)与GPU(Graphic Process Unit)或者CPU与其它设备协同计算的过程中,通过PCI(Peripheral Component Interconnect)总线将GPU等其他设备连接至CPU,承担并行计算的任务.为了解决PCI接口芯片与GPU芯片之间的异步传输和时序匹配问题,基于PCI总线规范与GPU芯片的时序规范,采用跨时钟域信号的处理方法,设计了一个CPU与GPU之间跨时钟域连接的时序匹配接口电路.通过仿真,验证了该电路的正确性.结果表明,该电路可工作在252 MHz频率下,能够满足GPU与CPU间接口电路对速率和带宽的要求.展开更多
Anovel beamforming algorithmnamed Delay Multiply and Sum(DMAS),which excels at enhancing the resolution and contrast of ultrasonic image,has recently been proposed.However,there are nested loops in this algorithm,so t...Anovel beamforming algorithmnamed Delay Multiply and Sum(DMAS),which excels at enhancing the resolution and contrast of ultrasonic image,has recently been proposed.However,there are nested loops in this algorithm,so the calculation complexity is higher compared to the Delay and Sum(DAS)beamformer which is widely used in industry.Thus,we proposed a simple vector-based method to lower its complexity.The key point is to transform the nested loops into several vector operations,which can be efficiently implemented on many parallel platforms,such as Graphics Processing Units(GPUs),and multi-core Central Processing Units(CPUs).Consequently,we considered to implement this algorithm on such a platform.In order to maximize the use of computing power,we use the GPUs andmulti-core CPUs inmixture.The platform used in our test is a low cost Personal Computer(PC),where a GPU and a multi-core CPU are installed.The results show that the hybrid use of a CPU and a GPU can get a significant performance improvement in comparison with using a GPU or using amulti-core CPU alone.The performance of the hybrid system is increased by about 47%–63%compared to a single GPU.When 32 elements are used in receiving,the fame rate basically can reach 30 fps.In the best case,the frame rate can be increased to 40 fps.展开更多
文摘在构建CPU(Central Process Unit,CPU)与GPU(Graphic Process Unit)或者CPU与其它设备协同计算的过程中,通过PCI(Peripheral Component Interconnect)总线将GPU等其他设备连接至CPU,承担并行计算的任务.为了解决PCI接口芯片与GPU芯片之间的异步传输和时序匹配问题,基于PCI总线规范与GPU芯片的时序规范,采用跨时钟域信号的处理方法,设计了一个CPU与GPU之间跨时钟域连接的时序匹配接口电路.通过仿真,验证了该电路的正确性.结果表明,该电路可工作在252 MHz频率下,能够满足GPU与CPU间接口电路对速率和带宽的要求.
基金This work was supported by the Science and Technology Research Program of Chongqing Municipal Education Commission(Grant No.KJQN201801606)the Natural Sci-ence Foundation Project of CQ CSTC(cstc2017jcyjAX0092)+3 种基金the Scientific Research Program of Chongqing University of Education(Grant Nos.KY201924C,2017XJZDWT02)the Science and Technology Research Program of Chongqing Municipal Education Commission(Grant No.KJ1601410)the Project‘Future School(Infant Education)’of National Center For Schooling Development Programme of China(Grant No.CSDP18FC2202)the Chongqing Electronics Engineering Technology Research Center for Interactive Learning,and the Chongqing Big Data Engineering Laboratory for Children.
文摘Anovel beamforming algorithmnamed Delay Multiply and Sum(DMAS),which excels at enhancing the resolution and contrast of ultrasonic image,has recently been proposed.However,there are nested loops in this algorithm,so the calculation complexity is higher compared to the Delay and Sum(DAS)beamformer which is widely used in industry.Thus,we proposed a simple vector-based method to lower its complexity.The key point is to transform the nested loops into several vector operations,which can be efficiently implemented on many parallel platforms,such as Graphics Processing Units(GPUs),and multi-core Central Processing Units(CPUs).Consequently,we considered to implement this algorithm on such a platform.In order to maximize the use of computing power,we use the GPUs andmulti-core CPUs inmixture.The platform used in our test is a low cost Personal Computer(PC),where a GPU and a multi-core CPU are installed.The results show that the hybrid use of a CPU and a GPU can get a significant performance improvement in comparison with using a GPU or using amulti-core CPU alone.The performance of the hybrid system is increased by about 47%–63%compared to a single GPU.When 32 elements are used in receiving,the fame rate basically can reach 30 fps.In the best case,the frame rate can be increased to 40 fps.