期刊文献+

基于改进GFCC特征参数的广播音频语种识别 被引量:1

Broadcast Audio Language Identification Based on Improved GFCC Feature Parameters
下载PDF
导出
摘要 针对广播音频语种识别中与语种识别无关的特征对识别结果产生影响的问题,提出一种基于伽马频率倒谱系数的改进特征参数的语种识别方法.通过提取每帧信号的能量谱包络,去除部分与说话人相关的特征,采用Gammatone滤波器组滤波,经离散余弦变换后再进行倒谱提升,得到改进的伽马频率倒谱系数特征参数.将广播音频信号提取特征参数输入隐Markov模型中进行训练测试,得到的语种识别结果表明,该方法有效提升了广播音频语种识别的准确率,优于目前使用的伽马频率倒谱系数特征及其衍生方法. To address the problem that features unrelated to language identification in broadcast audio have an impact on the language identification results,an improved language identification method based on gamma frequency cepstrum coefficients with improved feature parameters is proposed.By extracting the energy spectral envelope of each frame,the speaker-related features are removed,filtered by a Gammatone filter banks,and then by the discrete cosine transform and cepstrum lifting to obtain the improved gamma frequency cepstrum feature parameters.The feature parameters extracted from broadcast audio signal were input into hidden Markov model for training and testing,and the language identification results were obtained.The results show that the proposed method can effectively improve the language identification accuracy for broadcast audio,which is better than the currently used gamma frequency cepstrum coefficient features and their derivatives.
作者 邵玉斌 陈亮 龙华 杜庆治 SHAO Yubin;CHEN Liang;LONG Hua;DU Qingzhi(School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China)
出处 《吉林大学学报(理学版)》 CAS 北大核心 2022年第2期417-424,共8页 Journal of Jilin University:Science Edition
基金 国家自然科学基金(批准号:61761025).
关键词 广播音频语种识别 能量谱包络 倒谱提升 改进伽马频率倒谱系数 broadcast audio language identificaition energy spectrum envelope cepstrum lifting improved gamma frequency cepstrum coefficient
  • 相关文献

参考文献7

二级参考文献56

  • 1王伟,邓辉文.基于MFCC参数和VQ的说话人识别系统[J].仪器仪表学报,2006,27(z3):2253-2255. 被引量:30
  • 2赵腊生,张强,魏小鹏.语音情感识别研究进展[J].计算机应用研究,2009,26(2):428-432. 被引量:21
  • 3李朝晖,迟惠生.听觉外周计算模型研究进展[J].声学学报,2006,31(5):449-465. 被引量:22
  • 4Zissman M A. Comparison of four approaches to automatic language identification of telephone speech [J]. IEEE Transactions on Speech and Audio Processing, 1996, 4(1): 31 - 44.
  • 5Li H, Ma B, Lee C H. A vector space modeling approach to spoken language identification [J]. IEEE Transactions on Audio, Speech and Language Processing, 2007, 15(1): 271 - 284.
  • 6Huang X D, Acero A, Hon H W. Spoken Language Processing [M]. Upper Saddle River, NJ: Prentice Hall PTR, 2000.
  • 7Abdulla W H. Auditory based feature vectors for speech recognition systems [J]. Advances in Communications and Software Technologies, 2002: 231- 236.
  • 8Li Q, Soong F, Siohan O. A high-performance auditory feature for robust speeeh recognition [C]//Proe 6th Int Conf on Spoken Language Processing. Beijing: China Military Friendship Publish, 2000, Ⅲ: 51- 54.
  • 9Colombi J M, Anderson T R, Rogers S K. Auditory model representation for speaker recognition [C]//Proc ICASSP. Piscataway, NJ: IEEE Press, 2006, Ⅱ:700-703.
  • 10Glasberg B R, Moore B C. Derivation of auditory filter shapes from notched-noise data [J]. Hearing Research, 1990, 47(1-2): 103-108.

共引文献117

同被引文献8

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部