The independent hypothesis between frames in vocal effect(VE) recognition makes it difficult for frame based spectral features to describe the intrinsic temporal correlation and dynamic change information in speech ph...The independent hypothesis between frames in vocal effect(VE) recognition makes it difficult for frame based spectral features to describe the intrinsic temporal correlation and dynamic change information in speech phenomena. A novel VE detection method based on echo state network(ESN) is proposed. The input sequences are mapped into a fixed-dimensionality vector in high dimensional coding space by reservoir of the ESN. Then, radial basis function(RBF) networks are employed to fit the probability density function(pdf) of each VE mode by using the vectors in the high dimensional coding space. Finally, the minimum error rate Bayesian decision is employed to judge the VE mode. The experiments which are conducted on isolated words test set achieve 79.5% average recognition accuracy, and the results show that the proposed method can overcome the defect of the independent hypothesis between frames effectively.展开更多
为了改善发声力度变化对说话人识别系统性能的影响。针对不同发声力度下语音信号的分析,提出了使用发声力度最大后验概率(Vocal Effort Maximum A Posteriori,VEMAP)自适应方法更新基于高斯混合模型-通用背景模型(Gaussian Mixture Mode...为了改善发声力度变化对说话人识别系统性能的影响。针对不同发声力度下语音信号的分析,提出了使用发声力度最大后验概率(Vocal Effort Maximum A Posteriori,VEMAP)自适应方法更新基于高斯混合模型-通用背景模型(Gaussian Mixture Model-Universal Background Model,GMM-UBM)的说话人识别系统模型。实验表明,所提出的方法使不同发声力度下系统EER%降低了88.45%与85.16%,有效解决了因发声力度变化引起的训练语音与测试语音音量失配,从而导致说话人识别性能降低的问题,改善说话人识别系统性能效果显著。展开更多
基金supported by the National Natural Science Foundation of China (61502150,61300124)the Foundation for University Key Teacher by Henan Province (2015GGJS068)+2 种基金the Fundamental Research Funds for the Universities of Henan Province (NSFRF1616)the Foundation for Scientific and Technological Project of Henan Province (172102210279)the Key Scientific Research Projects of Universities in Henan (19A520004)
文摘The independent hypothesis between frames in vocal effect(VE) recognition makes it difficult for frame based spectral features to describe the intrinsic temporal correlation and dynamic change information in speech phenomena. A novel VE detection method based on echo state network(ESN) is proposed. The input sequences are mapped into a fixed-dimensionality vector in high dimensional coding space by reservoir of the ESN. Then, radial basis function(RBF) networks are employed to fit the probability density function(pdf) of each VE mode by using the vectors in the high dimensional coding space. Finally, the minimum error rate Bayesian decision is employed to judge the VE mode. The experiments which are conducted on isolated words test set achieve 79.5% average recognition accuracy, and the results show that the proposed method can overcome the defect of the independent hypothesis between frames effectively.
文摘为了改善发声力度变化对说话人识别系统性能的影响。针对不同发声力度下语音信号的分析,提出了使用发声力度最大后验概率(Vocal Effort Maximum A Posteriori,VEMAP)自适应方法更新基于高斯混合模型-通用背景模型(Gaussian Mixture Model-Universal Background Model,GMM-UBM)的说话人识别系统模型。实验表明,所提出的方法使不同发声力度下系统EER%降低了88.45%与85.16%,有效解决了因发声力度变化引起的训练语音与测试语音音量失配,从而导致说话人识别性能降低的问题,改善说话人识别系统性能效果显著。