摘要
针对离线语音识别过程中语音MFCC(Mel-Frequency Cepstral Coefficients)特征参数提取所存在的大量重复计算问题,通过优化时频转换计算,提出一种可存储重叠帧数据的时频转换结构,重复利用帧移重叠部分的FFT计算结果,提前使用Matlab对滤波器和DCT计算的相关参数进行计算,将系数矩阵分别存储在对应的参数ROM中,使用时直接访存获取。实验结果表明,通过该方案实现一帧语音的MFCC参数提取的系统耗时为0.18 ms,相比Cortex-M4和TMS320平台软件实现MFCC参数提取方案,运算速度分别提升了51倍和67倍,与同类硬件实现相比,速度提升近3倍,可以满足离线语音识别产品设计的实时性要求。
Aiming at the large number of repetitive calculation problems in the extraction of voice MFCC(Mel-Frequency Cepstral Coefficients)feature parameters in the offline speech recognition process,by optimizing the time⁃frequency conversion calculation,a time⁃frequency conversion structure that can store overlapping frame data is proposed and reused the FFT calculation results of the overlapped part of the frame shift,and the relevant parameters of the filter and DCT calculations are calculated in advance using Matlab to separate the intermediate parameters of the system,and the coefficient matrices are respectively stored in the corresponding parameter ROM,and directly accessed when it’s used.The experimental results show that the system time⁃consuming to extract the MFCC parameter extraction of one frame of speech through this scheme is 0.18 ms.Compared with the Cortex-M4 and TMS320 platform software to achieve the MFCC parameter extraction scheme,the calculation speed is increased by 51 times and 67 times,respectively.Compared with similar hardware implementations,the speed is increased by nearly 3 times,which can meet the real⁃time requirements of offline speech recognition product design.
作者
刘泽琛
焦继业
崔智恒
安超
LIU Zechen;JIAO Jiye;CUI Zhiheng;AN Chao(School of Electronic Engineering,Xi’an University of Posts and Telecommunications,Xi’an 710121,China;School of Computing,Xi’an University of Posts and Telecommunications,Xi’an 710121,China)
出处
《电子设计工程》
2022年第23期179-184,共6页
Electronic Design Engineering