摘要
针对广播音频语种识别中与语种识别无关的特征对识别结果产生影响的问题,提出一种基于伽马频率倒谱系数的改进特征参数的语种识别方法.通过提取每帧信号的能量谱包络,去除部分与说话人相关的特征,采用Gammatone滤波器组滤波,经离散余弦变换后再进行倒谱提升,得到改进的伽马频率倒谱系数特征参数.将广播音频信号提取特征参数输入隐Markov模型中进行训练测试,得到的语种识别结果表明,该方法有效提升了广播音频语种识别的准确率,优于目前使用的伽马频率倒谱系数特征及其衍生方法.
To address the problem that features unrelated to language identification in broadcast audio have an impact on the language identification results,an improved language identification method based on gamma frequency cepstrum coefficients with improved feature parameters is proposed.By extracting the energy spectral envelope of each frame,the speaker-related features are removed,filtered by a Gammatone filter banks,and then by the discrete cosine transform and cepstrum lifting to obtain the improved gamma frequency cepstrum feature parameters.The feature parameters extracted from broadcast audio signal were input into hidden Markov model for training and testing,and the language identification results were obtained.The results show that the proposed method can effectively improve the language identification accuracy for broadcast audio,which is better than the currently used gamma frequency cepstrum coefficient features and their derivatives.
作者
邵玉斌
陈亮
龙华
杜庆治
SHAO Yubin;CHEN Liang;LONG Hua;DU Qingzhi(School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China)
出处
《吉林大学学报(理学版)》
CAS
北大核心
2022年第2期417-424,共8页
Journal of Jilin University:Science Edition
基金
国家自然科学基金(批准号:61761025).
关键词
广播音频语种识别
能量谱包络
倒谱提升
改进伽马频率倒谱系数
broadcast audio language identificaition
energy spectrum envelope
cepstrum lifting
improved gamma frequency cepstrum coefficient