Speech signals in frequency domain were separated based on discrete wavelet transform (DWT) and independent component analysis (ICA). First, mixed speech signals were decomposed into different frequency domains by DWT...Speech signals in frequency domain were separated based on discrete wavelet transform (DWT) and independent component analysis (ICA). First, mixed speech signals were decomposed into different frequency domains by DWT and the subbands of speech signals were separated using ICA in each wavelet domain; then, the permutation and scaling problems of frequency domain blind source separation (BSS) were solved by utilizing the correlation between adjacent bins in speech signals; at last, source signals were reconstructed from single branches. Experiments were carried out with 2 sources and 6 microphones using speech signals at sampling rate of 40 kHz. The microphones were aligned with 2 sources in front of them, on the left and right. The separation of one male and one female speeches lasted 2.5 s. It is proved that the new method is better than single ICA method and the signal to noise ratio is improved by 1 dB approximately.展开更多
In this paper, we applied RobustICA to speech separation and made a comprehensive comparison to FastICA according to the separation results. Through a series of speech signal separation test, RobustICA reduced the sep...In this paper, we applied RobustICA to speech separation and made a comprehensive comparison to FastICA according to the separation results. Through a series of speech signal separation test, RobustICA reduced the separation time consumed by FastICA with higher stability, and speeches separated by RobustICA were proved to having lower separation errors. In the 14 groups of speech separation tests, separation time consumed by RobustICA was 3.185 s less than FastICA by nearly 68%. Separation errors of FastICA had a float between 0.004 and 0.02, while the errors of RobustlCA remained around 0.003. Furthermore, compared to FastICA, RobustlCA showed better separation robustness. Experimental results showed that RohustICA was successful to apply to the speech signal separation, and showed superiority to FastlCA in speech separation.展开更多
Synchronized acoustic-articulatory data is the basis of various applications,such as exploring the fundamental mechanisms of speech production,acoustic to articulatory inversion(AAI),and articulatory to acoustic mappi...Synchronized acoustic-articulatory data is the basis of various applications,such as exploring the fundamental mechanisms of speech production,acoustic to articulatory inversion(AAI),and articulatory to acoustic mapping(AAM).Numerous studies have been conducted based on the synchronized ElectroMagnetic Articulograhy(EMA)data and acoustic data.Hence,it is necessary to make clear whether the EMA-synchronized speech and stand-alone speech are different,and if so,how it affects the performance of the applications that are based on synchronized acoustic-articulatory data.In this study,we compare the differences between EMA-synchronized speech and stand-alone speech from the aspect of speech recognition based on the data of a male speaker.It is found that:i)the general error rate of EMA-synchronized speech is much higher than that of stand-alone speech;ii)apical vowels and apical/blade consonants are more significantly affected by the presence of EMA coils;iii)parts of vowel and consonant tokens are confused with the sounds who use the same articulator or the articulators nearby,such as confusion among apical vowels and confusion among apical and blade consonants;iv)the confusion of labial tokens demonstrates a diverse pattern.展开更多
基金Supported by Tianjin Municipal Science and Technology Commission (No.09JCYBJC02200)
文摘Speech signals in frequency domain were separated based on discrete wavelet transform (DWT) and independent component analysis (ICA). First, mixed speech signals were decomposed into different frequency domains by DWT and the subbands of speech signals were separated using ICA in each wavelet domain; then, the permutation and scaling problems of frequency domain blind source separation (BSS) were solved by utilizing the correlation between adjacent bins in speech signals; at last, source signals were reconstructed from single branches. Experiments were carried out with 2 sources and 6 microphones using speech signals at sampling rate of 40 kHz. The microphones were aligned with 2 sources in front of them, on the left and right. The separation of one male and one female speeches lasted 2.5 s. It is proved that the new method is better than single ICA method and the signal to noise ratio is improved by 1 dB approximately.
基金National Natural Science Foundation of Chinagrant number:61271082,61201029,61102094
文摘In this paper, we applied RobustICA to speech separation and made a comprehensive comparison to FastICA according to the separation results. Through a series of speech signal separation test, RobustICA reduced the separation time consumed by FastICA with higher stability, and speeches separated by RobustICA were proved to having lower separation errors. In the 14 groups of speech separation tests, separation time consumed by RobustICA was 3.185 s less than FastICA by nearly 68%. Separation errors of FastICA had a float between 0.004 and 0.02, while the errors of RobustlCA remained around 0.003. Furthermore, compared to FastICA, RobustlCA showed better separation robustness. Experimental results showed that RohustICA was successful to apply to the speech signal separation, and showed superiority to FastlCA in speech separation.
基金supported by the National Natural Science Foundation of China(No.61977049)Advanced Innovation Center for Language Resource and Intelligence(KYR17005)
文摘Synchronized acoustic-articulatory data is the basis of various applications,such as exploring the fundamental mechanisms of speech production,acoustic to articulatory inversion(AAI),and articulatory to acoustic mapping(AAM).Numerous studies have been conducted based on the synchronized ElectroMagnetic Articulograhy(EMA)data and acoustic data.Hence,it is necessary to make clear whether the EMA-synchronized speech and stand-alone speech are different,and if so,how it affects the performance of the applications that are based on synchronized acoustic-articulatory data.In this study,we compare the differences between EMA-synchronized speech and stand-alone speech from the aspect of speech recognition based on the data of a male speaker.It is found that:i)the general error rate of EMA-synchronized speech is much higher than that of stand-alone speech;ii)apical vowels and apical/blade consonants are more significantly affected by the presence of EMA coils;iii)parts of vowel and consonant tokens are confused with the sounds who use the same articulator or the articulators nearby,such as confusion among apical vowels and confusion among apical and blade consonants;iv)the confusion of labial tokens demonstrates a diverse pattern.