摘要
现有基于混合高斯模型的说话人聚类方法主要依据最大后验准则,从通用背景模型中自适应得到类别的混合高斯模型,然而自适应数据较少,模型的准确性不够.对此,文中尝试基于本征语音(EV)空间和全变化(TV)空间分析的两种因子分析建模方法,通过对差异空间的建模,减少估计类别混合高斯模型时需要估计的参数个数.结果表明,在美国国家标准技术研究所2008年说话人识别评测的电话语音数据集上,相对于基于最大后验概率准则的基线系统而言,文中所使用的基于EV和TV空间分析的建模方法都可使聚类错误率有较大幅度的下降,并且TV空间分析建模相对于EV空间分析建模能获得更低的聚类错误率.
The existing speaker clustering methods based on clusters' GMMs by adapting from universal background Gaussian mixture model (GMM) mainly obtain model (UBM). However, this adaptive method suffers from the lack of data and results in poor models. In this paper, two factor analysis modeling methods based on eigenvoice (EV) space analysis and total variability (TV) space analysis respectively are explored. The two methods greatly reduce the number of estimated parameters when clusters' GMMs are estimated by modeling variability space. The experimental results on two speakers telephone data in 2008 NIST Speaker Recognition Evaluation show that the two proposed methods achieve considerable reduction in speaker error rate compared to the baseline system using MAP adaptation, and the method based on TV space analysis obtains lower speaker error rate compared to the method based on EV space analysis.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2013年第1期1-5,共5页
Pattern Recognition and Artificial Intelligence
基金
国家自然科学基金项目(No.61172158)
安徽省自然科学基金项目(No.090412056)资助
关键词
说话人聚类
本征语音空间
全变化空间
交叉似然比
Speaker Clustering, Eigenvoice Space, Total Variability Space, Cross Likelihood Ratio