摘要
It is well known that auditory system of human beings has excellent performance which automatic speech recognition (ASR) systems can't match, and fractional Fourier transform (FrFT) has unique advantages in non-stationary signal processing. In this paper, the Gammatone filterbank is applied to speech signals for front-end temporal filtering, and then acoustic features of the output subband signals are extracted based on fractional Fourier transform. Considering the critical effect of transform order for FrFT, an order adaptation method based on the instantaneous frequency is proposed, and its performance is compared with the method based on ambiguity function. ASR experiments are conducted on clean and noisy Putonghua digits, and the results show that the proposed features achieve significantly higher recognition rate than the MFCC baseline, and the order adaptation method based on instantaneous frequency has much lower complexity than that based on ambiguity function. Further more, the FrFT-based features achieve the highest recognition rate using the proposed order adaptation method.
It is well known that auditory system of human beings has excellent performance which automatic speech recognition (ASR) systems can't match, and fractional Fourier transform (FrFT) has unique advantages in non-stationary signal processing. In this paper, the Gammatone filterbank is applied to speech signals for front-end temporal filtering, and then acoustic features of the output subband signals are extracted based on fractional Fourier transform. Considering the critical effect of transform order for FrFT, an order adaptation method based on the instantaneous frequency is proposed, and its performance is compared with the method based on ambiguity function. ASR experiments are conducted on clean and noisy Putonghua digits, and the results show that the proposed features achieve significantly higher recognition rate than the MFCC baseline, and the order adaptation method based on instantaneous frequency has much lower complexity than that based on ambiguity function. Further more, the FrFT-based features achieve the highest recognition rate using the proposed order adaptation method.
基金
supported by the National Science and Technology Major Projects(2010ZX03004-003-01)
the National Natural Science Foundation of China(90920304)
the Research Fund for the Doctoral Program of Higher Education of China(20101101110020)