期刊文献+

Acoustic features based on auditory model and adaptive fractional Fourier transform for speech recognition

Acoustic features based on auditory model and adaptive fractional Fourier transform for speech recognition
原文传递
导出
摘要 It is well known that auditory system of human beings has excellent performance which automatic speech recognition (ASR) systems can't match, and fractional Fourier transform (FrFT) has unique advantages in non-stationary signal processing. In this paper, the Gammatone filterbank is applied to speech signals for front-end temporal filtering, and then acoustic features of the output subband signals are extracted based on fractional Fourier transform. Considering the critical effect of transform order for FrFT, an order adaptation method based on the instantaneous frequency is proposed, and its performance is compared with the method based on ambiguity function. ASR experiments are conducted on clean and noisy Putonghua digits, and the results show that the proposed features achieve significantly higher recognition rate than the MFCC baseline, and the order adaptation method based on instantaneous frequency has much lower complexity than that based on ambiguity function. Further more, the FrFT-based features achieve the highest recognition rate using the proposed order adaptation method. It is well known that auditory system of human beings has excellent performance which automatic speech recognition (ASR) systems can't match, and fractional Fourier transform (FrFT) has unique advantages in non-stationary signal processing. In this paper, the Gammatone filterbank is applied to speech signals for front-end temporal filtering, and then acoustic features of the output subband signals are extracted based on fractional Fourier transform. Considering the critical effect of transform order for FrFT, an order adaptation method based on the instantaneous frequency is proposed, and its performance is compared with the method based on ambiguity function. ASR experiments are conducted on clean and noisy Putonghua digits, and the results show that the proposed features achieve significantly higher recognition rate than the MFCC baseline, and the order adaptation method based on instantaneous frequency has much lower complexity than that based on ambiguity function. Further more, the FrFT-based features achieve the highest recognition rate using the proposed order adaptation method.
出处 《Chinese Journal of Acoustics》 2011年第4期453-463,共11页 声学学报(英文版)
基金 supported by the National Science and Technology Major Projects(2010ZX03004-003-01) the National Natural Science Foundation of China(90920304) the Research Fund for the Doctoral Program of Higher Education of China(20101101110020)
  • 相关文献

参考文献3

二级参考文献30

共引文献237

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部