In multimedia conference, the capability of audio processing is basic and requires more for real-time criteria. In this article, we categorize and analyze the schemes, and provide several multipoint speech audio mixin...In multimedia conference, the capability of audio processing is basic and requires more for real-time criteria. In this article, we categorize and analyze the schemes, and provide several multipoint speech audio mixing schemes using weighted algorithm, which meet the demand of practical needs for real-time multipoint speech mixing, for which the ASW and AEW schemes are especially recommended. Applying the adaptive algorithms, the high-performance schemes we provide do not use the saturation operation widely used in multimedia processing. Therefore, no additional noise will be added to the output. The above adaptive algorithms have relatively low computational complexity and good hearing perceptibility. The schemes are designed for parallel processing, and can be easily implemented with hardware, such as DSPs, and widely applied in multimedia conference systems.展开更多
A distinguishing feature of a digital library is that it has Terabyte volumes of multimedia resources. One challenge for researchers in the field of multimedia is to find a testbed for showing the potentials of multim...A distinguishing feature of a digital library is that it has Terabyte volumes of multimedia resources. One challenge for researchers in the field of multimedia is to find a testbed for showing the potentials of multimedia technologies such as video summarization, semantic annotation, multimedia cross indexing and retrieval, and etc. Deeper research and wider applications of digital libraries revealed their indispensable role as testbed for multimedia technologies. This paper presents challenging issues of some key techniques used in digital libraries and their specific needs for multimedia technologies.展开更多
To achieve efficient a d compact low-dimensional features for speech emotion recognition,a novel featurereduction method using uncertain linear discriminant analysis is proposed.Using the same principles as for conven...To achieve efficient a d compact low-dimensional features for speech emotion recognition,a novel featurereduction method using uncertain linear discriminant analysis is proposed.Using the same principles as for conventional linear discriminant analysis(LDA),uncertainties of the noisy or distorted input data ae employed in order to estimate maximaiy discriminant directions.The effectiveness of the proposed uncertain LDA(ULDA)is demonstrated in the Uyghur speech emotion recognition task.The emotional features of Uyghur speech,especially,the fundamental fequency and formant,a e analyzed in the collected emotional data.Then,ULDA is employed in dimensionality reduction of emotional features and better performance is achieved compared with other dimensionality reduction techniques.The speech emotion recognition of Uyghur is implemented by feeding the low-dimensional data to support vector machine(SVM)based on the proposed ULDA.The experimental results show that when employing a appropriate uncertainty estimation algorithm,uncertain LDA outperforms the conveetional LDA counterpart on Uyghur speech emotion recognition.展开更多
The Autoregressive Moving Average (ARMA) model for whispered speech is proposed. with normal speech, whispered speech has no fundamental frequency because of the glottis being semi-opened and turbulent flow being cr...The Autoregressive Moving Average (ARMA) model for whispered speech is proposed. with normal speech, whispered speech has no fundamental frequency because of the glottis being semi-opened and turbulent flow being created, and formant shifting exists in the lower frequency region due to the narrowing of the tract in the false vocal fold regions and weak acoustic coupling with the aubglottal system. Analysis shows that the effect of the subglottal system is to introduce additional pole-zero pairs into the vocal tract transfer function. Theoretically, the method based on an ARMA process is superior to that based on an AR process in the spectral analysis of the whispered speech. Two methods, the least squared modified Yule-Walker likelihood estimate (LSMY) algorithm and the Frequency-Domain Steiglitz-Mcbide (FDSM) algorithm, are applied to the ARMA mfldel for the whispered speech. The performance evaluation shows that the ARMA model is much more appropriate for representing the whispered speech than the AR model, and the FDSM algorithm provides a name acorate estimation of the whispered speech spectral envelope than the LSMY algorithm with higher conputational complexity.展开更多
This study investigated how background speech affected L1 and L2 reading of Chinese English major students. English, Dutch, and Mandarin Chinese were respectively set as the second language (L2), foreign language ...This study investigated how background speech affected L1 and L2 reading of Chinese English major students. English, Dutch, and Mandarin Chinese were respectively set as the second language (L2), foreign language (FL), and first language (L1) background speech conditions. Self-paced word-by-word reading paradigm was used to collect the response time (RT) of each word. The conventional analysis revealed that L1 background speech exerted the most disruptive effect on both L1 and L2 reading could be phonological and could be at the and suggested that the background speech effect stage of phonological processing of L1 and L2 reading. It also implied that L1 phonological processing could be simultaneously activated during L2 reading. Spectral analysis of ten subjects' reading data indicated that pink noise existed in each time series of word RT of L1 and L2 reading in each condition. It provided clear evidence that L1 and L2 reading processing are similar with different concurrent background speech.展开更多
文摘In multimedia conference, the capability of audio processing is basic and requires more for real-time criteria. In this article, we categorize and analyze the schemes, and provide several multipoint speech audio mixing schemes using weighted algorithm, which meet the demand of practical needs for real-time multipoint speech mixing, for which the ASW and AEW schemes are especially recommended. Applying the adaptive algorithms, the high-performance schemes we provide do not use the saturation operation widely used in multimedia processing. Therefore, no additional noise will be added to the output. The above adaptive algorithms have relatively low computational complexity and good hearing perceptibility. The schemes are designed for parallel processing, and can be easily implemented with hardware, such as DSPs, and widely applied in multimedia conference systems.
文摘A distinguishing feature of a digital library is that it has Terabyte volumes of multimedia resources. One challenge for researchers in the field of multimedia is to find a testbed for showing the potentials of multimedia technologies such as video summarization, semantic annotation, multimedia cross indexing and retrieval, and etc. Deeper research and wider applications of digital libraries revealed their indispensable role as testbed for multimedia technologies. This paper presents challenging issues of some key techniques used in digital libraries and their specific needs for multimedia technologies.
基金The National Natural Science Foundation of China(No.61673108,61231002)
文摘To achieve efficient a d compact low-dimensional features for speech emotion recognition,a novel featurereduction method using uncertain linear discriminant analysis is proposed.Using the same principles as for conventional linear discriminant analysis(LDA),uncertainties of the noisy or distorted input data ae employed in order to estimate maximaiy discriminant directions.The effectiveness of the proposed uncertain LDA(ULDA)is demonstrated in the Uyghur speech emotion recognition task.The emotional features of Uyghur speech,especially,the fundamental fequency and formant,a e analyzed in the collected emotional data.Then,ULDA is employed in dimensionality reduction of emotional features and better performance is achieved compared with other dimensionality reduction techniques.The speech emotion recognition of Uyghur is implemented by feeding the low-dimensional data to support vector machine(SVM)based on the proposed ULDA.The experimental results show that when employing a appropriate uncertainty estimation algorithm,uncertain LDA outperforms the conveetional LDA counterpart on Uyghur speech emotion recognition.
基金supported by the Independent Innovation Foundation of Shandong University(No.2009JC004)the Natural Science Foundation of Shandong Province(No.Y2007G31)
文摘The Autoregressive Moving Average (ARMA) model for whispered speech is proposed. with normal speech, whispered speech has no fundamental frequency because of the glottis being semi-opened and turbulent flow being created, and formant shifting exists in the lower frequency region due to the narrowing of the tract in the false vocal fold regions and weak acoustic coupling with the aubglottal system. Analysis shows that the effect of the subglottal system is to introduce additional pole-zero pairs into the vocal tract transfer function. Theoretically, the method based on an ARMA process is superior to that based on an AR process in the spectral analysis of the whispered speech. Two methods, the least squared modified Yule-Walker likelihood estimate (LSMY) algorithm and the Frequency-Domain Steiglitz-Mcbide (FDSM) algorithm, are applied to the ARMA mfldel for the whispered speech. The performance evaluation shows that the ARMA model is much more appropriate for representing the whispered speech than the AR model, and the FDSM algorithm provides a name acorate estimation of the whispered speech spectral envelope than the LSMY algorithm with higher conputational complexity.
文摘This study investigated how background speech affected L1 and L2 reading of Chinese English major students. English, Dutch, and Mandarin Chinese were respectively set as the second language (L2), foreign language (FL), and first language (L1) background speech conditions. Self-paced word-by-word reading paradigm was used to collect the response time (RT) of each word. The conventional analysis revealed that L1 background speech exerted the most disruptive effect on both L1 and L2 reading could be phonological and could be at the and suggested that the background speech effect stage of phonological processing of L1 and L2 reading. It also implied that L1 phonological processing could be simultaneously activated during L2 reading. Spectral analysis of ten subjects' reading data indicated that pink noise existed in each time series of word RT of L1 and L2 reading in each condition. It provided clear evidence that L1 and L2 reading processing are similar with different concurrent background speech.