期刊文献+

子空间域相关特征变换与融合的语音识别方法 被引量:4

A Continuous Speech Recognition Method Using Dependent Feature Transformation and Combination of Subspace Region
下载PDF
导出
摘要 为了提高语音识别准确率,提出了一种子空间域相关特征变换与融合的语音识别方法(MFCC-BN-TC方法)。该方法提取语音短时谱结构特征(BN)和包络特征(MFCC)分别描述语音短时谱结构和包络信息,并采用域相关特征变换的形式分别对BN和MFCC特征进行特征变换;然后对这种变换进行泛化扩展提出子空间域相关特征变换,以采用不同的时间颗粒度(帧和语音分段)进行多层次区分性特征表达;最后,对多种区分性特征变换后的特征进行联合表征训练声学模型,并给出了区分性特征变换与融合的一般框架。实验结果表明:MFCC-BN-TC方法比采用原始BN特征方法和采用MFCC特征基线系统方法,识别性能各自提高了0.98%和1.62%;融合MFCCBN-TC方法变换以后的语音信号特征,相比于融合原始特征,识别率提升了1.5%。 A speech recognition method based on dependent feature transformation and combination of subspace regions(MFCC-BN-TC)is proposed to improve the recognition accuracy.The structure feature(BN)and envelope feature(MFCC)are extracted to separately describe the structure and envelope information of the short speech spectrum,and the region dependent feature transformation is adopted to perform feature transformation for the BN and the MFCC,respectively.The transformation is then generalized to give a subspace region-dependent feature transformation so that different time units(frame and segment)are applied to finish multi-level modeling.Moreover,a feature combination framework is proposed,and the acoustic model is trained using combined multi-features after transformation.Experimental results and comparisons with the method using raw BN and the method based on MFCC feature show that the recognition rate of the MFCC-BN-TC method increases by 0.96% and 1.62%,respectively.The gain in performance of the MFCC-BN-TC method increases by 1.5% through combining the transformed features.
出处 《西安交通大学学报》 EI CAS CSCD 北大核心 2016年第4期60-67,共8页 Journal of Xi'an Jiaotong University
基金 国家自然科学基金资助项目(61175017 61403415)
关键词 语音识别 区分性训练 深度神经网络 子空间域相关特征变换 speech recognition discriminative training deep neural network subspace regiondependent feature transformation
  • 相关文献

参考文献17

  • 1NASERSHARIF B, AKBARI A. SNR-dependent compression of enhanced Mel subband energies for compensation of noise effects on MFCC features [J]. Pattern Recognition Letters, 2011, 28 (11) : 1320- 1326.
  • 2刘晓明,班超帆,冯晓荣.失真控制下的短时谱估计语音增强算法[J].西安交通大学学报,2011,45(8):78-84. 被引量:2
  • 3POVEY D, KINGSBURY B, MANGU L, et al. fMPE: Discriminatively trained features for speech recognition [C]///Proceedings of the International Con- ference on Audio, Speech and Signal Processing. Pis- cataway, NJ, USA: IEEE, 2005: 961-964.
  • 4ZHANG B, MATSOUKAS S, SCHWARTZ R. Re- cent progress on the discriminative region-dependent transform for speech feature extraction [C] // Proceed- ings of the Annual Conference of International Speech Communication Association. Baixs, France: ISCA, 2006: 1495-1498.
  • 5YAN Z, HUO Q, XU J, et al. Tied-state based dis- criminative training of context-expanded region- dependent feature transforms for LVCSR [C]//Pro- ceedings of the International Conference on Audio, Speech and Signal Processing. Piscataway, N J, USA.. IEEE, 2013: 6940-6944.
  • 6高莹莹,朱维彬.深层神经网络中间层可见化建模[J].自动化学报,2015,41(9):1627-1637. 被引量:16
  • 7袁胜龙,郭武,戴礼荣.基于深层神经网络的藏语识别[J].模式识别与人工智能,2015,28(3):209-213. 被引量:14
  • 8SAINATH T N, KINGSBURY B, RAMABHAD- RAN B. Auto-encoder bottleneck features using deep belief networks [C]// Proceedings of the International Conference on Audio, Speech and Signal Processing. Piseataway, NJ, USA: IEEE, 2012: 4153-4156.
  • 9SAON G, KINGSBURY B. Discriminative feature- space transforms using deep neural networks [C]// Proceedings of the Annual Conference of International Speech Communication Association. Baixs, France: ISCA, 2012: 14-17.
  • 10PAULIK M. Lattice-based training of bottleneck fea- ture extraction neural networks [ C] /// Proceedings of the Annual Conference of International Speech Com- munication Association. Baixs, France: ISCA, 2013: 89-93.

二级参考文献48

  • 1卜凡亮,王为民,戴启军,陈砚圃.基于噪声被掩蔽概率的优化语音增强方法[J].电子与信息学报,2005,27(5):753-756. 被引量:16
  • 2BOLL S F. Suppression of acoustic noise in speech using spectral subtraction [J]. IEEE Transactions on Acoustics, Speech and Signal Processing, 1979, 27 (2) : 113-120.
  • 3EPHRAIM Y, MALAH D. Speech enhancement using a minimum mean square error short-time spectral amplitude estimator [J]. IEEE Transactions on Acoustics,Speech and Signal Processing, 1984, 2(6) : 1109-1121.
  • 4EPHRAIM Y, MALAH D. Speech enhancement using a minimum mean square error log spectral amplitude estimator [J]. IEEE Transactions on Acoustics, Speech, and Signal Processing,1985, 3(2): 443- 445.
  • 5MA Jianfen, LOIZOU P C. SNR loss: a new objective measure for predicting the intelligibility of noise-suppressed speeeh[EB/OL]. (2009 12-02) [2010-10- 24]. http: /// dx. doi. org/10.1016/j, specorrL 2010. 10. 005.
  • 6MA Jianfen, HU Yi, LOIZOU P C. Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions [J].Aeoust Soc Amer, 2009, 125(5):3387-3405.
  • 7LOIZOU P C, KIM P. Reasons why current speechenhancement algorithms do not improve speech intelligibility and suggested solutions [J]. IEEE Trans Au- dio Speech Lang Process, 2011, 19 (1):47-56.
  • 8MARTIN R. Noise power spectral density estimation based on optimal smoothing and minimum statistics [J]. IEEE Trans on Acoustics, Speech, and Signal Processing, 2001, 9(5): 504-512.
  • 9JOHSTOM J D. Transform coding of audio signals using perceptual noise criteria [J].IEEE Journal on Selected Areas in Communications, 1988, 6(2): 314- 323.
  • 10PORTER J E, BOLL F S. Optimal estimators for spectral restoration of speech [C]//Proceedings of International Conference on Acoustics, Speech, and Signal Processing. Piscataway, NJ, USA: IEEE, 1984: 53-56.

共引文献29

同被引文献30

引证文献4

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部