期刊文献+

基于因子分析建模的电话语音说话人聚类 被引量:1

Speaker Clustering of Telephone Speech Based on Front-End Factor Analysis
下载PDF
导出
摘要 现有基于混合高斯模型的说话人聚类方法主要依据最大后验准则,从通用背景模型中自适应得到类别的混合高斯模型,然而自适应数据较少,模型的准确性不够.对此,文中尝试基于本征语音(EV)空间和全变化(TV)空间分析的两种因子分析建模方法,通过对差异空间的建模,减少估计类别混合高斯模型时需要估计的参数个数.结果表明,在美国国家标准技术研究所2008年说话人识别评测的电话语音数据集上,相对于基于最大后验概率准则的基线系统而言,文中所使用的基于EV和TV空间分析的建模方法都可使聚类错误率有较大幅度的下降,并且TV空间分析建模相对于EV空间分析建模能获得更低的聚类错误率. The existing speaker clustering methods based on clusters' GMMs by adapting from universal background Gaussian mixture model (GMM) mainly obtain model (UBM). However, this adaptive method suffers from the lack of data and results in poor models. In this paper, two factor analysis modeling methods based on eigenvoice (EV) space analysis and total variability (TV) space analysis respectively are explored. The two methods greatly reduce the number of estimated parameters when clusters' GMMs are estimated by modeling variability space. The experimental results on two speakers telephone data in 2008 NIST Speaker Recognition Evaluation show that the two proposed methods achieve considerable reduction in speaker error rate compared to the baseline system using MAP adaptation, and the method based on TV space analysis obtains lower speaker error rate compared to the method based on EV space analysis.
出处 《模式识别与人工智能》 EI CSCD 北大核心 2013年第1期1-5,共5页 Pattern Recognition and Artificial Intelligence
基金 国家自然科学基金项目(No.61172158) 安徽省自然科学基金项目(No.090412056)资助
关键词 说话人聚类 本征语音空间 全变化空间 交叉似然比 Speaker Clustering, Eigenvoice Space, Total Variability Space, Cross Likelihood Ratio
  • 相关文献

参考文献11

  • 1Tranter S,Reynolds D.A. An Overview of Automatic Speaker Diarization Systems[J].IEEE Trans on Audio Speech Language Process,2006,(05):1557-1565.
  • 2Gauvain J L,Lamel L,Adda G. Partitioning and Transcription of Broadcast News Data[A].Sydney,Austrilia,1998.1335-1338.
  • 3Chen S S,Gopalakrishnam P S. Speaker,Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion[A].Lansdowne,USA,1998.127-132.
  • 4Siegier M A,Jain U,Raj B. Automatic Segmentation,Classification and Clustering of Broadcast News Audio[A].Chantilly,France,1997.97-99.
  • 5Gish H,Siu M,Rohlicek R. Segregation of Speakers for Speech Recognition and Speaker Identification[A].Toronto,Canada,1991.873-876.
  • 6Zhu X,Barras C,Meignier S. Combining Speaker Identification and BIC for Speaker Diarization[A].Lisbon,Portugal,2005.2441-2444.
  • 7Kenny P,Boulianne G,Dumouchel P. Eigenvoice Modeling with Sparse Training Data[J].IEEE Transactions on Speech and Audio Processing,2005,(03):345-359.
  • 8Reynolds D A,Quatieri T F,Dunn R. Speaker Verification Using Adapted Gaussian Mixture Model[J].Digital Signal Processing,2000,(13):19-41.
  • 9Dehak N,Kenny P,Dehak R. Front-End Factor Analysis for Speaker Verification[J].IEEE Trans on Audio[J].Speech and Language Processing,2011,(04):788-798.
  • 10Tritschler A,Gopinath R. Improved Speaker Segmentation and Segments Clustering Using the Bayesian Information Criterion[A].Budapest,Hungary,1999.679-682.

同被引文献7

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部