摘要
目前与文本无关的话者确认系统大都是基于GMM-UBM模型结构的,为了精确的描述说话人语音特征空间的分布,模型混合度M通常都选的很大,因而模型训练需要大量的语音数据。本文提出了一种基于分段估计概率分布函数的规整方法,在概率分布的意义上降低特征参数偏离高斯分布的程度,从而可以用较低混合度的高斯混合模型对其建模。同时,这种映射也是一种无监督规整,因此可以提高系统的鲁棒性及其确认性能。在NIST'03数据库上的实验表明,在使用相同混合度模型的情况下,概率分布规整后的参数相对于变换前的参数系统性能可以提高11%左右。
Current text-independent speaker verification systems are mostly based on GMM-UBM (Gaussian Mixture Model - Universal Background Model) structures. In order to model the distribution of speech signal exactly, the number of mixtures usually becomes very large. So that the speech needed to train the models will increase greatly too. The technique of parameters normalization based on piecewise estimating the cumulative distribution function is performed in this paper. In this way the non-Gaussianity in the means of the cumulative distribution of Mel-cepstral parameters is decreased. Thus GMMs with fewer mixtures could model them precisely. The projection is also unsupervised normalization technique, so as to improve the robustness and performance of the system. Experiments on the database of NIST'03 show that the verification performance of normalized parameters could relatively improve about 11% in contrast to original parameters when modeled with the same mixtures.
出处
《电路与系统学报》
CSCD
北大核心
2008年第6期91-95,90,共6页
Journal of Circuits and Systems
基金
国家自然科学基金资助项目(60272039)
教育部-微软重点实验室开放基金资助项目(05071810)