摘要
针对广播语音信号低信噪比下语种识别准确率低和鲁棒性差的问题,提出了基于小波包变换改进MFCC和能量算子倒谱特征的语种识别算法。首先,采用小波包变换代替MFCC中的傅里叶变换和Mel滤波得到WMFCC特征参数。在保留人耳听觉感知特性的基础上提升语音信号的高频分析能力和分析精确度,克服傅里叶变换的局限性。其次,提取Teager能量算子倒谱,得到语音瞬时能量的特性,与改进的MFCC特征参数融合得到新的特征参数TWMFCC。最后,为进一步提升低信噪比语音的识别效果,提出了VMD自适应维纳滤波去噪算法。通过实验对比了所提特征与传统特征的识别效果,所提特征的平均识别准确率显著提升,带噪语音在未进行语音去噪处理的情况下较传统MFCC高13.02%,有效改善了传统特征在低信噪比下识别准确率低的问题,具有较强的抗噪性和鲁棒性。
Aiming at the problem of low accuracy and poor robustness of language recognition under low signal-to-noise ratio of broadcast speech signals,a language recognition algorithm based on wavelet packet transform to improve MFCC and energy operator cepstrum features is proposed.Firstly,the WMFCC feature parameters are obtained by using wavelet packet transform instead of Fourier transform and Mel filter in MFCC.On the basis of retaining the auditory perception characteristics of the human ear,the high-frequency analysis ability and analysis accuracy of the speech signal are improved,and the limitations of the Fourier transform are overcomed.Secondly,the Teager energy operator cepstrum is extracted to obtain the characteristics of the instantaneous energy of the speech,which is fused with the improved MFCC feature parameters to obtain a new feature parameter TWMFCC.Finally,in order to further improve the recognition effect of low SNR speech,a VMD adaptive Wiener filtering denoising algorithm is proposed.The experiment compares the recognition effect of the proposed features with the traditional features.The average recognition accuracy of the proposed features is significantly improved,which is 13.02% higher than that of the traditional MFCC without speech denoising.It effectively alleviates the problem of low recognition accuracy of traditional features under low signal-to-noise ratio,and has strong anti-noise and robustness.
作者
陈思竹
龙华
邵玉斌
CHEN Sizhu;LONG Hua;SHAO Yubin(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China;Radio Monitoring Center of Yunnan Province,Kunming 650228,China)
出处
《计算机科学》
CSCD
北大核心
2024年第S02期367-372,共6页
Computer Science
基金
云南省媒体融合重点实验室开放基金(320225403)。