期刊文献+
共找到3篇文章
< 1 >
每页显示 20 50 100
Music/voice separation based on the multi-repeating structure of Mel cepstrum coefficient 被引量:4
1
作者 ZHANG Tianqi XU Xin +1 位作者 WU Wangjun LIU Yu 《Chinese Journal of Acoustics》 CSCD 2015年第4期424-435,共12页
For the poor adaptability of the original repeating pattern, an improved music separation method of multi-repeating structure of Mel cepstrum coefficient (MFCC) is proposed. Firstly, the MFCC coefficient matrix (39... For the poor adaptability of the original repeating pattern, an improved music separation method of multi-repeating structure of Mel cepstrum coefficient (MFCC) is proposed. Firstly, the MFCC coefficient matrix (39-dimensional data) of the music signal was extracted. Then the cosine characteristic was applied to the count of similarity matrix of MFCC, and the fragments with consistent similarity are putted together. Next different repeating patterns are built for different groups. Thereby the spectrums of the background music and vocal were separated combined with ideal binary masking (IBM), and the corresponding time domain signals were obtained by inverse Fourier transform. Fnally, the improved method was tested on the music database of different types and length, and the separation results were compared with repeating method of Rafii and the non-negative matrix factorization based on flexible framework method of Ozerov. The experimental results showed that the separation performance of improved method was improved about 3 dB, and the performance of music with melody changed larger was significantly improved. Experiments verified that the improved method was an effective music separation algorithm and more stability. 展开更多
关键词 MFCC Music/voice separation based on the multi-repeating structure of Mel cepstrum coefficient Mel
原文传递
Research on Voiceprint Recognition of Camouflage Voice Based on Deep Belief Network 被引量:4
2
作者 Nan Jiang Ting Liu 《International Journal of Automation and computing》 EI CSCD 2021年第6期947-962,共16页
The problem of disguised voice recognition based on deep belief networks is studied. A hybrid feature extraction algorithm based on formants, Gammatone frequency cepstrum coefficients(GFCC) and their different coeffic... The problem of disguised voice recognition based on deep belief networks is studied. A hybrid feature extraction algorithm based on formants, Gammatone frequency cepstrum coefficients(GFCC) and their different coefficients is proposed to extract more discriminative speaker features from the original voice data. Using mixed features as the input of the model, a masquerade voice library is constructed. A masquerade voice recognition model based on a depth belief network is proposed. A dropout strategy is introduced to prevent overfitting, which effectively solves the problems of traditional Gaussian mixture models, such as insufficient modeling ability and low discrimination. Experimental results show that the proposed disguised voice recognition method can better fit the feature distribution, and significantly improve the classification effect and recognition rate. 展开更多
关键词 Disguised voice recognition deep belief network feature extraction Gammatone frequency cepstrum coefficients(GFCC) DROPOUT
原文传递
Adaptive Compensation Algorithm in Open Vocabulary Mandarin Speaker-Independent Speech Recognition
3
作者 FadhilH.T.Al-dulaimy 王作英 田野 《Tsinghua Science and Technology》 SCIE EI CAS 2002年第5期521-526,共6页
In speech recognition systems, the physiological characteristics of the speech production model cause the voiced sections of the speech signal to have an attenuation of approximately 20 dB per decade. Many speech rec... In speech recognition systems, the physiological characteristics of the speech production model cause the voiced sections of the speech signal to have an attenuation of approximately 20 dB per decade. Many speech recognition algorithms have been developed to solve this problem by filtering the input signal with a single-zero high pass filter. Unfortunately, this technique increases the noise energy at high frequencies above 4 kHz, which in some cases degrades the recognition accuracy. This paper solves the problem using a pre-emphasis filter in the front end of the recognizer. The aim is to develop a modified parameterization approach taking into account the whole energy zone in the spectrum to improve the performance of the existing baseline recognition system in the acoustic phase. The results show that a large vocabulary speaker-independent continuous speech recognition system using this approach has a greatly improved recognition rate. 展开更多
关键词 mel-frequency cepstrum coefficients speech recognition duration distribution based hidden Markov model
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部