In recent years, the accuracy of speech recognition (SR) has been one of the most active areas of research. Despite that SR systems are working reasonably well in quiet conditions, they still suffer severe performance...In recent years, the accuracy of speech recognition (SR) has been one of the most active areas of research. Despite that SR systems are working reasonably well in quiet conditions, they still suffer severe performance degradation in noisy conditions or distorted channels. It is necessary to search for more robust feature extraction methods to gain better performance in adverse conditions. This paper investigates the performance of conventional and new hybrid speech feature extraction algorithms of Mel Frequency Cepstrum Coefficient (MFCC), Linear Prediction Coding Coefficient (LPCC), perceptual linear production (PLP), and RASTA-PLP in noisy conditions through using multivariate Hidden Markov Model (HMM) classifier. The behavior of the proposal system is evaluated using TIDIGIT human voice dataset corpora, recorded from 208 different adult speakers in both training and testing process. The theoretical basis for speech processing and classifier procedures were presented, and the recognition results were obtained based on word recognition rate.展开更多
重音是语言交流中不可或缺的部分,在语言交流中扮演着非常重要的角色。为了验证基于听觉模型的短时谱特征集在汉语重音检测方法中的应用效果,使用MFCC(Mel frequency cepstrum coefficient)和RASTAPLP(relative spectra perceptual line...重音是语言交流中不可或缺的部分,在语言交流中扮演着非常重要的角色。为了验证基于听觉模型的短时谱特征集在汉语重音检测方法中的应用效果,使用MFCC(Mel frequency cepstrum coefficient)和RASTAPLP(relative spectra perceptual linear prediction)算法提取每个语音段的短时谱信息,分别构建了基于MFCC算法的短时谱特征集和基于RASTA-PLP算法的短时谱特征集;选用NaiveBayes分类器对这两类特征集进行建模,把具有最大后验概率的类作为该对象所属的类,这种分类方法充分利用了当前语音段的相关语音特性;基于MFCC的短时谱特征集和基于RASTA-PLP的短时谱特征集在ASCCD(annotated speech corpus of Chinese discourse)上能够分别得到82.1%和80.8%的汉语重音检测正确率。实验结果证明,基于MFCC的短时谱特征和基于RASTA-PLP的短时谱特征能用于汉语重音检测研究。展开更多
音调篡改技术是语音伪造常用的一种技术手段,可能对说话人验证系统造成威胁。对噪声和压缩场景下的伪造语音检测问题进行了研究,提出了基于改进胶囊网络的音调篡改检测算法。为增强鲁棒性,将相对频谱感知线性预测(RelAtive SpecTrAl-Per...音调篡改技术是语音伪造常用的一种技术手段,可能对说话人验证系统造成威胁。对噪声和压缩场景下的伪造语音检测问题进行了研究,提出了基于改进胶囊网络的音调篡改检测算法。为增强鲁棒性,将相对频谱感知线性预测(RelAtive SpecTrAl-Perceptual Linear Predictive,RASTAPLP)和梅尔倒谱系数(Mel-scale Frequency Cepstral Coefficients,MFCC)融合为新特征,并输入优化的胶囊网络,对经加噪和压缩处理的音频进行检测。实验结果表明,该算法在已知噪声、未知噪声和压缩场景下的检测准确率均在98%以上,和现有的一些算法相比,具有较高的检测准确率和鲁棒性。展开更多
文摘In recent years, the accuracy of speech recognition (SR) has been one of the most active areas of research. Despite that SR systems are working reasonably well in quiet conditions, they still suffer severe performance degradation in noisy conditions or distorted channels. It is necessary to search for more robust feature extraction methods to gain better performance in adverse conditions. This paper investigates the performance of conventional and new hybrid speech feature extraction algorithms of Mel Frequency Cepstrum Coefficient (MFCC), Linear Prediction Coding Coefficient (LPCC), perceptual linear production (PLP), and RASTA-PLP in noisy conditions through using multivariate Hidden Markov Model (HMM) classifier. The behavior of the proposal system is evaluated using TIDIGIT human voice dataset corpora, recorded from 208 different adult speakers in both training and testing process. The theoretical basis for speech processing and classifier procedures were presented, and the recognition results were obtained based on word recognition rate.
文摘重音是语言交流中不可或缺的部分,在语言交流中扮演着非常重要的角色。为了验证基于听觉模型的短时谱特征集在汉语重音检测方法中的应用效果,使用MFCC(Mel frequency cepstrum coefficient)和RASTAPLP(relative spectra perceptual linear prediction)算法提取每个语音段的短时谱信息,分别构建了基于MFCC算法的短时谱特征集和基于RASTA-PLP算法的短时谱特征集;选用NaiveBayes分类器对这两类特征集进行建模,把具有最大后验概率的类作为该对象所属的类,这种分类方法充分利用了当前语音段的相关语音特性;基于MFCC的短时谱特征集和基于RASTA-PLP的短时谱特征集在ASCCD(annotated speech corpus of Chinese discourse)上能够分别得到82.1%和80.8%的汉语重音检测正确率。实验结果证明,基于MFCC的短时谱特征和基于RASTA-PLP的短时谱特征能用于汉语重音检测研究。
文摘音调篡改技术是语音伪造常用的一种技术手段,可能对说话人验证系统造成威胁。对噪声和压缩场景下的伪造语音检测问题进行了研究,提出了基于改进胶囊网络的音调篡改检测算法。为增强鲁棒性,将相对频谱感知线性预测(RelAtive SpecTrAl-Perceptual Linear Predictive,RASTAPLP)和梅尔倒谱系数(Mel-scale Frequency Cepstral Coefficients,MFCC)融合为新特征,并输入优化的胶囊网络,对经加噪和压缩处理的音频进行检测。实验结果表明,该算法在已知噪声、未知噪声和压缩场景下的检测准确率均在98%以上,和现有的一些算法相比,具有较高的检测准确率和鲁棒性。