This paper proposes a cochlear prosthetic system with an implanted digital signal processor (DSP). This system transmits voice-band signals with a low data rate through the wireless link, free of the data-rate limit...This paper proposes a cochlear prosthetic system with an implanted digital signal processor (DSP). This system transmits voice-band signals with a low data rate through the wireless link, free of the data-rate limitation and suitable for future development. By optimizing the speech processing algorithm and the DSP hardware design, the implanted DSP manages to execute the continuous interleaved sampling (CIS) algorithm at a clock frequency of 3MHz and a power consumption of only 1.91mW. With an analytic power-transmission efficiency of the wireless inductive link (40%), the power overhead caused by the implanted DSP is derived as 2.87roW,which is trivial when compared with the power consumption of existing cochlear prosthetic systems (tens of milliwatts). With the DSP implanted,this new system can.be easily developed into a fully implanted cochlear prosthesis.展开更多
针对MPEG-4(Moving Picture Experts Group-4)视频编码器每个宏块在传送过程中Cache严重缺失,视频序列帧率低等问题,提出了运动估计算法的优化、Cache使用优化、SAD(Sum of Absolute Difference)和像素插值优化以及利用EDMA(Exte...针对MPEG-4(Moving Picture Experts Group-4)视频编码器每个宏块在传送过程中Cache严重缺失,视频序列帧率低等问题,提出了运动估计算法的优化、Cache使用优化、SAD(Sum of Absolute Difference)和像素插值优化以及利用EDMA(Extended Direct Memony Access)进行数据搬移等方法,提高存储速度,并在TMS320DM642 DSP(Digital Signal Processors)平台上实现了MPEG-4视频编码器的优化。实验结果表明,优化后比优化前,图像和视频处理函数的计算速度提高了1.18—97.5倍。视频序列帧率提高了6倍以上,达到了25帧/s的实时性要求。展开更多
矩阵乘卷积算法能够为各种卷积配置提供高性能基础实现,是面向给定芯片进行卷积性能优化的首要选择。针对国防科技大学自主研制的飞腾异构多核数字信号处理器(digital signal processor,DSP)芯片的特征以及矩阵乘卷积算法自身的特点,提...矩阵乘卷积算法能够为各种卷积配置提供高性能基础实现,是面向给定芯片进行卷积性能优化的首要选择。针对国防科技大学自主研制的飞腾异构多核数字信号处理器(digital signal processor,DSP)芯片的特征以及矩阵乘卷积算法自身的特点,提出了一种面向多核DSP架构的高性能并行矩阵乘卷积实现算法ftmEConv。该算法由输入特征图转换、卷积核转换、矩阵乘以及输出特征图转换这四个均运行在通用多核DSP上的并行化部分构成,通过有效挖掘通用DSP核中功能单元的潜力来提升各个部分的性能。实验结果表明,ftmEConv实现了高达42.90%的计算效率,与芯片上的其他矩阵乘卷积算法实现相比,获得了高达7.79倍的性能加速。展开更多
This paper presents the design and implementation of a low power digital signal processor (THUCIDSP-1 ) targeting at application for cochlear implants. Multi-level low power strategies including algorithm optimizati...This paper presents the design and implementation of a low power digital signal processor (THUCIDSP-1 ) targeting at application for cochlear implants. Multi-level low power strategies including algorithm optimization, operand isolation, clock gating and memory partitioning are adopted in the processor design to reduce the power consumption. Experimental results show that the complexity of the Continuous Interleaved Sampling (CIS) algorithm is reduced by more than 80 % and the power dissipation of the hardware alone is reduced by about 25% with the low power methods. The THUCIDSP-1 prototype, fabricated in 0.18-μm standard CMOS process, consumes only 1.91 mW when executing the CIS algorithm at 3 MHz.展开更多
矩阵转置是矩阵运算的基本操作,广泛应用于信号处理、科学计算以及深度学习等各种领域。随着国防科技大学自主研制的飞腾异构多核数字信号处理器(digital signal processor, DSP)在各种领域中的推广应用,对高性能矩阵转置实现提出了强...矩阵转置是矩阵运算的基本操作,广泛应用于信号处理、科学计算以及深度学习等各种领域。随着国防科技大学自主研制的飞腾异构多核数字信号处理器(digital signal processor, DSP)在各种领域中的推广应用,对高性能矩阵转置实现提出了强烈需求。针对飞腾异构多核DSP的体系结构特征与矩阵转置操作的特点,提出了一种适配不同数据位宽(8 B、4 B以及2 B)矩阵的并行矩阵转置算法ftmMT。该算法基于DSP中向量处理单元的Load/Store部件实现了向量化,同时基于矩阵分块实现了多个DSP核的并行处理,通过隐式乒乓设计实现了片上向量化转置与片外访存的重叠以及访存性能的大幅提升。实验结果表明,ftmMT能够显著加快矩阵转置操作,与CPU上的开源转置库HPTT相比,可获得高达8.99倍的性能加速。展开更多
本文在考虑满足不同的性能指标或某种实际要求时的基础上,提出了一种基于优化理论的有源电力滤波器的控制算法。该算法针对满足一定约束要求下来设计有源滤波器(APF:Active Power Filter)补偿性能,其补偿电流在a-b-c坐标系内确定...本文在考虑满足不同的性能指标或某种实际要求时的基础上,提出了一种基于优化理论的有源电力滤波器的控制算法。该算法针对满足一定约束要求下来设计有源滤波器(APF:Active Power Filter)补偿性能,其补偿电流在a-b-c坐标系内确定,无需坐标变换.APF控制电路的实现比起传统方法相对简单,并通过仿真实验证明此算法有很好的滤波效果。展开更多
基金the National Natural Science Foundation of China(No.60475018)~~
文摘This paper proposes a cochlear prosthetic system with an implanted digital signal processor (DSP). This system transmits voice-band signals with a low data rate through the wireless link, free of the data-rate limitation and suitable for future development. By optimizing the speech processing algorithm and the DSP hardware design, the implanted DSP manages to execute the continuous interleaved sampling (CIS) algorithm at a clock frequency of 3MHz and a power consumption of only 1.91mW. With an analytic power-transmission efficiency of the wireless inductive link (40%), the power overhead caused by the implanted DSP is derived as 2.87roW,which is trivial when compared with the power consumption of existing cochlear prosthetic systems (tens of milliwatts). With the DSP implanted,this new system can.be easily developed into a fully implanted cochlear prosthesis.
文摘针对MPEG-4(Moving Picture Experts Group-4)视频编码器每个宏块在传送过程中Cache严重缺失,视频序列帧率低等问题,提出了运动估计算法的优化、Cache使用优化、SAD(Sum of Absolute Difference)和像素插值优化以及利用EDMA(Extended Direct Memony Access)进行数据搬移等方法,提高存储速度,并在TMS320DM642 DSP(Digital Signal Processors)平台上实现了MPEG-4视频编码器的优化。实验结果表明,优化后比优化前,图像和视频处理函数的计算速度提高了1.18—97.5倍。视频序列帧率提高了6倍以上,达到了25帧/s的实时性要求。
文摘矩阵乘卷积算法能够为各种卷积配置提供高性能基础实现,是面向给定芯片进行卷积性能优化的首要选择。针对国防科技大学自主研制的飞腾异构多核数字信号处理器(digital signal processor,DSP)芯片的特征以及矩阵乘卷积算法自身的特点,提出了一种面向多核DSP架构的高性能并行矩阵乘卷积实现算法ftmEConv。该算法由输入特征图转换、卷积核转换、矩阵乘以及输出特征图转换这四个均运行在通用多核DSP上的并行化部分构成,通过有效挖掘通用DSP核中功能单元的潜力来提升各个部分的性能。实验结果表明,ftmEConv实现了高达42.90%的计算效率,与芯片上的其他矩阵乘卷积算法实现相比,获得了高达7.79倍的性能加速。
基金Supported by the National Natural Science Foundation of China (No. 60475018)
文摘This paper presents the design and implementation of a low power digital signal processor (THUCIDSP-1 ) targeting at application for cochlear implants. Multi-level low power strategies including algorithm optimization, operand isolation, clock gating and memory partitioning are adopted in the processor design to reduce the power consumption. Experimental results show that the complexity of the Continuous Interleaved Sampling (CIS) algorithm is reduced by more than 80 % and the power dissipation of the hardware alone is reduced by about 25% with the low power methods. The THUCIDSP-1 prototype, fabricated in 0.18-μm standard CMOS process, consumes only 1.91 mW when executing the CIS algorithm at 3 MHz.
文摘本文在考虑满足不同的性能指标或某种实际要求时的基础上,提出了一种基于优化理论的有源电力滤波器的控制算法。该算法针对满足一定约束要求下来设计有源滤波器(APF:Active Power Filter)补偿性能,其补偿电流在a-b-c坐标系内确定,无需坐标变换.APF控制电路的实现比起传统方法相对简单,并通过仿真实验证明此算法有很好的滤波效果。