For the poor adaptability of the original repeating pattern, an improved music separation method of multi-repeating structure of Mel cepstrum coefficient (MFCC) is proposed. Firstly, the MFCC coefficient matrix (39...For the poor adaptability of the original repeating pattern, an improved music separation method of multi-repeating structure of Mel cepstrum coefficient (MFCC) is proposed. Firstly, the MFCC coefficient matrix (39-dimensional data) of the music signal was extracted. Then the cosine characteristic was applied to the count of similarity matrix of MFCC, and the fragments with consistent similarity are putted together. Next different repeating patterns are built for different groups. Thereby the spectrums of the background music and vocal were separated combined with ideal binary masking (IBM), and the corresponding time domain signals were obtained by inverse Fourier transform. Fnally, the improved method was tested on the music database of different types and length, and the separation results were compared with repeating method of Rafii and the non-negative matrix factorization based on flexible framework method of Ozerov. The experimental results showed that the separation performance of improved method was improved about 3 dB, and the performance of music with melody changed larger was significantly improved. Experiments verified that the improved method was an effective music separation algorithm and more stability.展开更多
The problem of disguised voice recognition based on deep belief networks is studied. A hybrid feature extraction algorithm based on formants, Gammatone frequency cepstrum coefficients(GFCC) and their different coeffic...The problem of disguised voice recognition based on deep belief networks is studied. A hybrid feature extraction algorithm based on formants, Gammatone frequency cepstrum coefficients(GFCC) and their different coefficients is proposed to extract more discriminative speaker features from the original voice data. Using mixed features as the input of the model, a masquerade voice library is constructed. A masquerade voice recognition model based on a depth belief network is proposed. A dropout strategy is introduced to prevent overfitting, which effectively solves the problems of traditional Gaussian mixture models, such as insufficient modeling ability and low discrimination. Experimental results show that the proposed disguised voice recognition method can better fit the feature distribution, and significantly improve the classification effect and recognition rate.展开更多
In speech recognition systems, the physiological characteristics of the speech production model cause the voiced sections of the speech signal to have an attenuation of approximately 20 dB per decade. Many speech rec...In speech recognition systems, the physiological characteristics of the speech production model cause the voiced sections of the speech signal to have an attenuation of approximately 20 dB per decade. Many speech recognition algorithms have been developed to solve this problem by filtering the input signal with a single-zero high pass filter. Unfortunately, this technique increases the noise energy at high frequencies above 4 kHz, which in some cases degrades the recognition accuracy. This paper solves the problem using a pre-emphasis filter in the front end of the recognizer. The aim is to develop a modified parameterization approach taking into account the whole energy zone in the spectrum to improve the performance of the existing baseline recognition system in the acoustic phase. The results show that a large vocabulary speaker-independent continuous speech recognition system using this approach has a greatly improved recognition rate.展开更多
基金supported by the National Natural Science Foundation of China(61371164,61275099,61102131)the Project of Key Laboratory of Signal and Information Processing of Chongqing(CSTC2009CA2003)+3 种基金the Chongqing Distinguished Youth Fundation(CSTC2011jjjq40002)the Natural Science Foundation of Chongqing(CSTC2012JJA40008)the Research Project of Chongqing Educational Commission(KJ120525,KJ130524)Graduate Research and Innovation Projects of Chongqing(CYS14140)
文摘For the poor adaptability of the original repeating pattern, an improved music separation method of multi-repeating structure of Mel cepstrum coefficient (MFCC) is proposed. Firstly, the MFCC coefficient matrix (39-dimensional data) of the music signal was extracted. Then the cosine characteristic was applied to the count of similarity matrix of MFCC, and the fragments with consistent similarity are putted together. Next different repeating patterns are built for different groups. Thereby the spectrums of the background music and vocal were separated combined with ideal binary masking (IBM), and the corresponding time domain signals were obtained by inverse Fourier transform. Fnally, the improved method was tested on the music database of different types and length, and the separation results were compared with repeating method of Rafii and the non-negative matrix factorization based on flexible framework method of Ozerov. The experimental results showed that the separation performance of improved method was improved about 3 dB, and the performance of music with melody changed larger was significantly improved. Experiments verified that the improved method was an effective music separation algorithm and more stability.
基金supported by Natural Science Foundation of Liaoning Province (Nos. 2019-ZD-0168 and 2020-KF-12-11)Major Training Program of Criminal Investigation Police University of China (No. 3242019010)+1 种基金Key Research and Development Projects of Ministry of Science and Technology (No. 2017YFC0821005)Second Batch of New Engineering Research and Practice Projects(No. E-AQGABQ20202710)。
文摘The problem of disguised voice recognition based on deep belief networks is studied. A hybrid feature extraction algorithm based on formants, Gammatone frequency cepstrum coefficients(GFCC) and their different coefficients is proposed to extract more discriminative speaker features from the original voice data. Using mixed features as the input of the model, a masquerade voice library is constructed. A masquerade voice recognition model based on a depth belief network is proposed. A dropout strategy is introduced to prevent overfitting, which effectively solves the problems of traditional Gaussian mixture models, such as insufficient modeling ability and low discrimination. Experimental results show that the proposed disguised voice recognition method can better fit the feature distribution, and significantly improve the classification effect and recognition rate.
基金Supported by the National High- TechnologyDevelopm ent Program of China(No.2 0 0 1AA1140 71)
文摘In speech recognition systems, the physiological characteristics of the speech production model cause the voiced sections of the speech signal to have an attenuation of approximately 20 dB per decade. Many speech recognition algorithms have been developed to solve this problem by filtering the input signal with a single-zero high pass filter. Unfortunately, this technique increases the noise energy at high frequencies above 4 kHz, which in some cases degrades the recognition accuracy. This paper solves the problem using a pre-emphasis filter in the front end of the recognizer. The aim is to develop a modified parameterization approach taking into account the whole energy zone in the spectrum to improve the performance of the existing baseline recognition system in the acoustic phase. The results show that a large vocabulary speaker-independent continuous speech recognition system using this approach has a greatly improved recognition rate.