A novel text independent speaker identification system is proposed. In the proposed system, the 12-order perceptual linear predictive cepstrum and their delta coefficients in the span of five frames are extracted from...A novel text independent speaker identification system is proposed. In the proposed system, the 12-order perceptual linear predictive cepstrum and their delta coefficients in the span of five frames are extracted from the segmented speech based on the method of pitch synchronous analysis. The Fisher ratios of the original coefficients then be calculated, and the coefficients whose Fisher ratios are bigger are selected to form the 13-dimensional feature vectors of speaker. The Gaussian mixture model is used to model the speakers. The experimental results show that the identification accuracy of the proposed system is obviously better than that of the systems based on other conventional coefficients like the linear predictive cepstral coefficients and the Mel-frequency cepstral coefficients.展开更多
In this paper,we present a comparison of Khasi speech representations with four different spectral features and novel extension towards the development of Khasi speech corpora.These four features include linear predic...In this paper,we present a comparison of Khasi speech representations with four different spectral features and novel extension towards the development of Khasi speech corpora.These four features include linear predictive coding(LPC),linear prediction cepstrum coefficient(LPCC),perceptual linear prediction(PLP),and Mel frequency cepstral coefficient(MFCC).The 10-hour speech data were used for training and 3-hour data for testing.For each spectral feature,different hidden Markov model(HMM)based recognizers with variations in HMM states and different Gaussian mixture models(GMMs)were built.The performance was evaluated by using the word error rate(WER).The experimental results show that MFCC provides a better representation for Khasi speech compared with the other three spectral features.展开更多
文摘A novel text independent speaker identification system is proposed. In the proposed system, the 12-order perceptual linear predictive cepstrum and their delta coefficients in the span of five frames are extracted from the segmented speech based on the method of pitch synchronous analysis. The Fisher ratios of the original coefficients then be calculated, and the coefficients whose Fisher ratios are bigger are selected to form the 13-dimensional feature vectors of speaker. The Gaussian mixture model is used to model the speakers. The experimental results show that the identification accuracy of the proposed system is obviously better than that of the systems based on other conventional coefficients like the linear predictive cepstral coefficients and the Mel-frequency cepstral coefficients.
基金supported by the Visvesvaraya Ph.D.Scheme for Electronics and IT students launched by the Ministry of Electronics and Information Technology(MeiTY),Government of India under Grant No.PhD-MLA/4(95)/2015-2016.
文摘In this paper,we present a comparison of Khasi speech representations with four different spectral features and novel extension towards the development of Khasi speech corpora.These four features include linear predictive coding(LPC),linear prediction cepstrum coefficient(LPCC),perceptual linear prediction(PLP),and Mel frequency cepstral coefficient(MFCC).The 10-hour speech data were used for training and 3-hour data for testing.For each spectral feature,different hidden Markov model(HMM)based recognizers with variations in HMM states and different Gaussian mixture models(GMMs)were built.The performance was evaluated by using the word error rate(WER).The experimental results show that MFCC provides a better representation for Khasi speech compared with the other three spectral features.