摘要
本文提出了一种新的基于GMM和非均匀线性预测倒谱系数(NLPC)的客观语音质量评估方法。首先,通过Bark双线性变换(BBT)对线性频谱进行频谱弯折,弯折后的频谱符合人耳听觉感知的非均匀特性。然后通过对非均匀谱的线性预测计算出NLPC。提取参考语音的NLPC用来对高斯混合模型进行训练。通过训练对参考语音建立参考模型。由参考模型和失真语音的NLPC向量可以得到它们之间的一致性测度。最后,通过多元自适应回归样条函数建立主观MOS分和一致性测度之间的映射关系,可以得到对MOS分的客观预测模型。通过这一模型进行语音质量的客观评价。实验表明,提出算法的性能要好于ITU-T P.563标准中的算法。
A novel approach for output-based speech quality evaluation is proposed based on Non-uniform Linear Prediction Cepstrum(NLPC) and Gaussian Mixture Models(GMMs).Bark Bilinear Transform(BBT) is employed for spectrum warping that incorporates the non-uniform resolution properties of the human ear.Then,the algorithm computes NLPC coefficients from warped spectrum.GMMs are used to form a reference model of normative behavior by training on features extracted from clean speech signals.A measure of consistency between the degraded speech coefficient vector and reference model serves as indicators of speech quality evaluation.Finally,using a Multivariate Adaptive Regression Splines(MARS) function,an objective forecast model is constructed to accomplish the mapping from the subjective Mean Opinion Score(MOS) to the consistency measure.The experimental results indicate that the performance of proposed approach is better than that of ITU-T P.563 standard.
出处
《电路与系统学报》
CSCD
北大核心
2010年第4期104-109,90,共7页
Journal of Circuits and Systems
关键词
语音质量
客观评价
非均匀线性预测倒谱系数
高斯混合模型
多元自适应回归样条
speech quality
objective speech quality evaluation
non-uniform linear prediction cepstrum coefficient
Gaussian mixture model
multivariate adaptive regression splines