期刊文献+

融合分布对齐和对抗学习的无监督跨域声纹识别

Unsupervised cross-domain speaker recognition based on distribution alignment and adversarial learning
下载PDF
导出
摘要 针对声纹识别领域不匹配,且目标领域缺少标注数据的难题,提出在对抗学习基础上融合分布对齐的无监督领域自适应方法,通过训练过程中统计分布的对齐,以减小领域差异,从而提取声音中更有声纹鉴别性的特征,取得了稳定的性能提升。在文本相关的声纹识别任务中,对抗学习和分布对齐的方法能协同发挥作用,等错率相对降低11%;在文本无关的任务中,对抗学习效果不稳定,而分布对齐的方法依然有相对8%的性能提升。实验结果证明该方法在领域不匹配且目标领域缺少标注数据时,能有效提取语音中声纹鉴别信息,稳定提升识别性能。 Domain mismatch has become one of the biggest challenges for realistic speaker recognition systems,especially labeled data in the target domain are unavailable.The proposed methods fuse with adversarial learning to extract speaker discriminative features.It reduces domain discrepancy by distribution alignment during the training stage.Consistent performance improvements are achieved under variety of domain mismatch circumstances.For text-dependent tasks,adversarial learning and distribution alignment work together to reduce the equal error rates 11%relatively.As for text-independent tasks,adversarial learning can hardly make contributions while our distribution alignment still achieves a relative 8%improvement.The proposed methods can steadily improve the performance effectively for unsupervised cross-domain speaker recognition.
作者 陈志高 赵庆卫 王丽 王文超 CHEN Zhigao;ZHAO Qingwei;WANG Li;WANG Wenchao(Key Laboratory of Speech Acoustics and Content Understanding,Institute of Acoustics,Chinese Academy of Sciences Beijing 100190;University of Chinese Academy of Sciences,Beijing 100049)
出处 《声学学报》 EI CAS CSCD 北大核心 2021年第5期767-774,共8页 Acta Acustica
基金 国家自然科学基金项目(11590774,11590772,11590770)资助。
  • 相关文献

参考文献4

二级参考文献27

  • 1YUYibiao,WANGShuozhong.Speaker identification based on complete feature corpus and evaluation of mutual information[J].Chinese Journal of Acoustics,2005,24(3):280-288. 被引量:1
  • 2俞一彪,王朔中.文本无关说话人识别的全特征矢量集模型及互信息评估方法[J].声学学报,2005,30(6):536-541. 被引量:7
  • 3Kinnunen T, Li H Z. An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 2010, 52(1): 12-40.
  • 4Dehak N, Kenny P, Ouellet P, Dumouchel P. Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech and Language Processing, 2011, 19(4): 788-798.
  • 5Campbell W M, Campbell J P, Reynolds D A, Singer E, Torres-Carrasquillo P A. Support vector machines for speaker and language recognition. Computer Speech and Language, 2006, 20(2-3): 210-229.
  • 6Kenny P, Boulianne G, Ouellet P, Dumouchel P. Speaker and session variability in GMM-based speaker verification. IEEE Transactions on Audio, Speech and Language Processing, 2007, 15(4): 1448-1460.
  • 7Kenny P, Boulianne G, Ouellet P, Dumouchel P. Joint factor analysis versus eigenchannels in speaker recognition. IEEE Transactions on Audio, Speech and Language Processing, 2007, 15(4): 1435-1447.
  • 8Reynolds D A, Quatieri T F, Dunn R B. Speaker verifica- tion using adapted Gaussian mixture models. Digital Signal Processing, 2000, 10(1-3): 19-41.
  • 9Cortes C, Vapnik V. Support vector networks. Machine Learning, 1995, 20(3): 273-297.
  • 10Kenny P, Boulianne G, Dumouchel P. Eigenvoice model- ing with sparse training data. IEEE Transactions on Audio, Speech, and Language Processing, 2005, 13(3): 345-354.

共引文献41

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部