期刊文献+

Comparison of Different Implementations of MFCC 被引量:17

Comparison of Different Implementations of MFCC
原文传递
导出
摘要 The performance of the Mel-Frequency Cepstrum Coefficients (MFCC) may be affected by (1) the number of filters, (2) the shape of filters, (3) the way in which filters are spaced, and (4) the way in which the power spectrum is warped. In this paper, several compar- ison experiments are done to find a best implementation. The traditional MFCC calculation excludes the 0th coefficient for the reason that it is regarded as somewhat unreliable. According to the analysis and experiments, the authors find that it can be regarded as the generalized frequency band energy (FBE) and is hence useful, which results in the FBE-MFCC. The au- thors also propose a better analysis, namely the auto-regressive analysis, on the frame energy, which outperforms its 1st and/or 2nd order differential derivatives. Experiments with the '863' Speech Database show that, compared with the traditional MFCC with its corresponding auto- regressive analysis coefficients, the FBE-MFCC and the frame energy with their corresponding auto-regressive analysis coefficients form the best combination, reducing the Chinese syllable er- ror rate (CSER) by about 10%, while the FBE-MFCC with the corresponding auto-regressive analysis coefficients reduces CSER by 2.5%. Comparison experiments are also done with a quite casual Chinese speech database, named Chinese Annotated Spontaneous Speech (CASS) corpus. The FBE-MFCC can reduce the error rate by about 2.9% on an average. The performance of the Mel-Frequency Cepstrum Coefficients (MFCC) may be affected by (1) the number of filters, (2) the shape of filters, (3) the way in which filters are spaced, and (4) the way in which the power spectrum is warped. In this paper, several compar- ison experiments are done to find a best implementation. The traditional MFCC calculation excludes the 0th coefficient for the reason that it is regarded as somewhat unreliable. According to the analysis and experiments, the authors find that it can be regarded as the generalized frequency band energy (FBE) and is hence useful, which results in the FBE-MFCC. The au- thors also propose a better analysis, namely the auto-regressive analysis, on the frame energy, which outperforms its 1st and/or 2nd order differential derivatives. Experiments with the '863' Speech Database show that, compared with the traditional MFCC with its corresponding auto- regressive analysis coefficients, the FBE-MFCC and the frame energy with their corresponding auto-regressive analysis coefficients form the best combination, reducing the Chinese syllable er- ror rate (CSER) by about 10%, while the FBE-MFCC with the corresponding auto-regressive analysis coefficients reduces CSER by 2.5%. Comparison experiments are also done with a quite casual Chinese speech database, named Chinese Annotated Spontaneous Speech (CASS) corpus. The FBE-MFCC can reduce the error rate by about 2.9% on an average.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2001年第6期582-589,共8页 计算机科学技术学报(英文版)
关键词 MFCC frequency band energy auto-regressive analysis generalized ini- tial/final MFCC, frequency band energy, auto-regressive analysis, generalized ini- tial/final
  • 相关文献

参考文献5

  • 1Chen X X,Int Conference on Spoken Language Processing(ICSLP'2000),2000年
  • 2Li A J,Int Conference on Spoken Language Processing(ICSLP'2000),2000年
  • 3Zheng F,Int Symposium on Chinese Spoken Language Processing(ISCSLP'98),1998年,ASRA349页
  • 4Huang X D,Automatic Speech and Speaker Recognition:Advanced Topics,1996年,481页
  • 5Zheng F,学位论文,1992年

同被引文献74

  • 1葛道辉,李洪升,张亮,刘如意,沈沛意,苗启广.轻量级神经网络架构综述[J].软件学报,2020(9):2627-2653. 被引量:45
  • 2顾明亮,沈兆勇.基于语音配列的汉语方言自动辨识[J].中文信息学报,2006,20(5):77-82. 被引量:19
  • 3陈杰,张玲华,吴玺宏.基于小波包-LPCC的说话人识别特征参数[J].南京邮电大学学报(自然科学版),2007,27(6):54-56. 被引量:5
  • 4SCHAFER P B, JIN D Z. Noise-robust speech recognition through auditory feature detection and spike sequence decoding[J].Neural Computation, 2014, 26(3): 523-556.
  • 5SLOIN A, BURSHTEIN D. Support vector machine training for improved hidden Markov modeling[J].IEEE Transactions on Signal Processing, 2008, 56(1): 172-188.
  • 6TAKIGUCHI T, ARIKI Y. PCA-based speech enhancement for distorted speech recognition[J].Journal of Multimedia, 2007, 2(5): 13-18.
  • 7HINTON G E, SALAKHUTDINOV R R. Reducing the dimensionality of data with neural networks[J].Science, 2006, 313(5786): 504-507.
  • 8SALAKHUTDINOV R, HINTON G E. Replicated Softmax: an undirected topic model [C]∥Proceedings of the Advances in Neural Information Processing Systems. Cambridge, MA, USA: MIT Press, 2009: 1607-1614.
  • 9DODDINGTON G R, SCHALK T B. Speech recognition: turning theory to practice[J].IEEE Spectrum, 1981, 18(9): 26-32.
  • 10VARGA A, STEENEKEN H J M. Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems[J].Speech Communication, 1993, 12(3): 247-251.

引证文献17

二级引证文献41

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部