The recognition of pathological voice is considered a difficult task for speech analysis.Moreover,otolaryngologists needed to rely on oral communication with patients to discover traces of voice pathologies like dysph...The recognition of pathological voice is considered a difficult task for speech analysis.Moreover,otolaryngologists needed to rely on oral communication with patients to discover traces of voice pathologies like dysphonia that are caused by voice alteration of vocal folds and their accuracy is between 60%–70%.To enhance detection accuracy and reduce processing speed of dysphonia detection,a novel approach is proposed in this paper.We have leveraged Linear Discriminant Analysis(LDA)to train multiple Machine Learning(ML)models for dysphonia detection.Several ML models are utilized like Support Vector Machine(SVM),Logistic Regression,and K-nearest neighbor(K-NN)to predict the voice pathologies based on features like Mel-Frequency Cepstral Coefficients(MFCC),Fundamental Frequency(F0),Shimmer(%),Jitter(%),and Harmonic to Noise Ratio(HNR).The experiments were performed using Saarbrucken Voice Data-base(SVD)and a privately collected dataset.The K-fold cross-validation approach was incorporated to increase the robustness and stability of the ML models.According to the experimental results,our proposed approach has a 70%increase in processing speed over Principal Component Analysis(PCA)and performs remarkably well with a recognition accuracy of 95.24%on the SVD dataset surpassing the previous best accuracy of 82.37%.In the case of the private dataset,our proposed method achieved an accuracy rate of 93.37%.It can be an effective non-invasive method to detect dysphonia.展开更多
Voice conversion algorithm aims to provide high level of similarity to the target voice with an acceptable level of quality.The main object of this paper was to build a nonlinear relationship between the parameters fo...Voice conversion algorithm aims to provide high level of similarity to the target voice with an acceptable level of quality.The main object of this paper was to build a nonlinear relationship between the parameters for the acoustical features of source and target speaker using Non-Linear Canonical Correlation Analysis(NLCCA) based on jointed Gaussian mixture model.Speaker indi-viduality transformation was achieved mainly by altering vocal tract characteristics represented by Line Spectral Frequencies(LSF).To obtain the transformed speech which sounded more like the target voices,prosody modification is involved through residual prediction.Both objective and subjective evaluations were conducted.The experimental results demonstrated that our proposed algorithm was effective and outperformed the conventional conversion method utilized by the Minimum Mean Square Error(MMSE) estimation.展开更多
Recently, Quality of Experience (QoE) of voice service has been paid more attentions because it represents the performance of voice service subjectively perceived by the end users. And speech quality is commonly used ...Recently, Quality of Experience (QoE) of voice service has been paid more attentions because it represents the performance of voice service subjectively perceived by the end users. And speech quality is commonly used to measure the QoE value. In this paper, a speech quality assessment algorithm is proposed for GSM network, aiming to predict and monitor QoE of voice service based on radio link parameters with low complexity for operators. Multiple Linear Regression (MLR) and Principal Component Analysis (PCA) are combined and used to establish the mapping model from radio link parameters to speech quality. Data set for model training and testing is obtained from real commercial network of China Mobile. The experimental results show that with sufficient training data, this algorithm can predict radio speech quality with high accuracy and could be used to monitor speech quality of mobile network in real time.展开更多
A novel algorithm for voice conversion is proposed in this paper. The mapping function of spectral vectors of the source and target speakers is calculated by the Canonical Correlation Analysis (CCA) estimation based o...A novel algorithm for voice conversion is proposed in this paper. The mapping function of spectral vectors of the source and target speakers is calculated by the Canonical Correlation Analysis (CCA) estimation based on Gaussian mixture models. Since the spectral envelope feature remains a majority of second order statistical information contained in speech after Linear Prediction Coding (LPC) analysis, the CCA method is more suitable for spectral conversion than Minimum Mean Square Error (MMSE) because CCA explicitly considers the variance of each component of the spectral vectors during conversion procedure. Both objective evaluations and subjective listening tests are conducted. The experimental results demonstrate that the proposed scheme can achieve better per- formance than the previous method which uses MMSE estimation criterion.展开更多
This work deals the application of the artificial immune system to discriminate between healthy and people with Parkinson’s disease (PWP). As the symptoms of Parkinson’s disease (PD) occur gradually and mostly targe...This work deals the application of the artificial immune system to discriminate between healthy and people with Parkinson’s disease (PWP). As the symptoms of Parkinson’s disease (PD) occur gradually and mostly targeting the elderly people for whom physical visits to the clinic are inconvenient and costly, telemonitoring of the disease using measurements of dysphonia (vocal features) has a vital role in its early diagnosis. Taking inspiration from natural immune systems, we try to grab useful properties such as automatic recognition, memorization and adaptation. The developed algorithms have as a base the algorithm of training bio inspired CLONCLAS. The results obtained are satisfactory and show a great reliability of the approach.展开更多
文摘The recognition of pathological voice is considered a difficult task for speech analysis.Moreover,otolaryngologists needed to rely on oral communication with patients to discover traces of voice pathologies like dysphonia that are caused by voice alteration of vocal folds and their accuracy is between 60%–70%.To enhance detection accuracy and reduce processing speed of dysphonia detection,a novel approach is proposed in this paper.We have leveraged Linear Discriminant Analysis(LDA)to train multiple Machine Learning(ML)models for dysphonia detection.Several ML models are utilized like Support Vector Machine(SVM),Logistic Regression,and K-nearest neighbor(K-NN)to predict the voice pathologies based on features like Mel-Frequency Cepstral Coefficients(MFCC),Fundamental Frequency(F0),Shimmer(%),Jitter(%),and Harmonic to Noise Ratio(HNR).The experiments were performed using Saarbrucken Voice Data-base(SVD)and a privately collected dataset.The K-fold cross-validation approach was incorporated to increase the robustness and stability of the ML models.According to the experimental results,our proposed approach has a 70%increase in processing speed over Principal Component Analysis(PCA)and performs remarkably well with a recognition accuracy of 95.24%on the SVD dataset surpassing the previous best accuracy of 82.37%.In the case of the private dataset,our proposed method achieved an accuracy rate of 93.37%.It can be an effective non-invasive method to detect dysphonia.
基金Supported by the National High Technology Research and Development Program of China (863 Program,No.2006AA010102)
文摘Voice conversion algorithm aims to provide high level of similarity to the target voice with an acceptable level of quality.The main object of this paper was to build a nonlinear relationship between the parameters for the acoustical features of source and target speaker using Non-Linear Canonical Correlation Analysis(NLCCA) based on jointed Gaussian mixture model.Speaker indi-viduality transformation was achieved mainly by altering vocal tract characteristics represented by Line Spectral Frequencies(LSF).To obtain the transformed speech which sounded more like the target voices,prosody modification is involved through residual prediction.Both objective and subjective evaluations were conducted.The experimental results demonstrated that our proposed algorithm was effective and outperformed the conventional conversion method utilized by the Minimum Mean Square Error(MMSE) estimation.
文摘Recently, Quality of Experience (QoE) of voice service has been paid more attentions because it represents the performance of voice service subjectively perceived by the end users. And speech quality is commonly used to measure the QoE value. In this paper, a speech quality assessment algorithm is proposed for GSM network, aiming to predict and monitor QoE of voice service based on radio link parameters with low complexity for operators. Multiple Linear Regression (MLR) and Principal Component Analysis (PCA) are combined and used to establish the mapping model from radio link parameters to speech quality. Data set for model training and testing is obtained from real commercial network of China Mobile. The experimental results show that with sufficient training data, this algorithm can predict radio speech quality with high accuracy and could be used to monitor speech quality of mobile network in real time.
文摘A novel algorithm for voice conversion is proposed in this paper. The mapping function of spectral vectors of the source and target speakers is calculated by the Canonical Correlation Analysis (CCA) estimation based on Gaussian mixture models. Since the spectral envelope feature remains a majority of second order statistical information contained in speech after Linear Prediction Coding (LPC) analysis, the CCA method is more suitable for spectral conversion than Minimum Mean Square Error (MMSE) because CCA explicitly considers the variance of each component of the spectral vectors during conversion procedure. Both objective evaluations and subjective listening tests are conducted. The experimental results demonstrate that the proposed scheme can achieve better per- formance than the previous method which uses MMSE estimation criterion.
文摘This work deals the application of the artificial immune system to discriminate between healthy and people with Parkinson’s disease (PWP). As the symptoms of Parkinson’s disease (PD) occur gradually and mostly targeting the elderly people for whom physical visits to the clinic are inconvenient and costly, telemonitoring of the disease using measurements of dysphonia (vocal features) has a vital role in its early diagnosis. Taking inspiration from natural immune systems, we try to grab useful properties such as automatic recognition, memorization and adaptation. The developed algorithms have as a base the algorithm of training bio inspired CLONCLAS. The results obtained are satisfactory and show a great reliability of the approach.