期刊文献+

基于BIC和G_PLDA的说话人分离技术研究 被引量:7

The research of speaker diarization based on BIC and G_PLDA
下载PDF
导出
摘要 传统的以贝叶斯信息准则(Bayesian information criterion,BIC)作为相似性度量的说话人分离技术,在短时对话的分离任务中能取得较好的效果,但是随着对话时长的增加,BIC的单高斯模型不足以描述不同说话人数据的分布,且层次聚类(Hierarchical agglomerative clustering,HAC)时,区分相同说话人和不同说话人的门限值难以划定.针对此问题,提出基于短时BIC和长时G_PLDA的融合方法,充分利用BIC在短时聚类的可靠性和G_PLDA在长时段上的优异区分性,在美国国家标准技术局(NIST)08Summed测试集上的实验表明,该方法将分类错误率(DER)从BIC基线系统的2.34%降到1.54%,性能相对提升34.2%. The traditional technology for speaker diarization(SD), which exploits the Bayesian iniormauon criterion(BIC) as the similarity metric, can obtain good results in the short dialogue task, but with the length of the dialogue increasing , single Gaussian model of BIC is insufficient to describe the information distribution of different speakers. Moreover, it is difficult to delineate the threshold between the same speakers and different speakers when using hierarchical clustering (HAC). To solve this problem, a fusion method between BIC and G_PLDA was proposed, so as to make full use of the reliability of BIC in short- term clustering and the excellent discriminating power of G_PLDA in long utterancs. A set of experiments based on NIST 08 Summed shows that this new fusion method reduces the diariazation error rate (DER) from 2.34 ~ of BIC baseline system to 1.54 ~, improving performance of speaker diarization by 34.2 ~.
出处 《中国科学技术大学学报》 CAS CSCD 北大核心 2015年第4期286-293,共8页 JUSTC
  • 相关文献

参考文献13

  • 1Moattar M H, Homayounpour M M. A review on speaker diarization systems and approaches[J]. Speech Communication, 2012, 54(10):1065-1103.
  • 2TranterS E, Reynolds D A. An overview of automatie speaker diarization systems[J]. IEEE Transactions on Audio, Speech, and Language Processing, ,2006, 14 (5) : 1557-1565.
  • 3Makino S, Lee T W, Sawada H. Blind Speech Separation[M]. Berlin, Germany: Springer, 2007.
  • 4Wang D L, Brown G J. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications [M]. New Jersey, USA: Wiley, 2006.
  • 5Chen S S, Gopalakrishnan P S. Speaker, environment and channel change detection and clustering via the Bayesian information criterion[C]//Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop. Morgan Kaufman, 1998 : 127-132.
  • 6Ben M, Betser M, Bimbot F, et al. Speaker diarization using bottom-up clustering based on a parameter- derived distance between adapted GMMs [C]// Proceedings of the International Conference on Spoken Language Processing. Jeju, Korea: IEEE Press, 2004: 2329-2332.
  • 7Dehak N, Kenny P, Dehak R, et al. Front-end factor analysis for speaker verification[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19 (4) : 788-798.
  • 8ShumS, Dehak N, Chuangsuwanich E, et al. Exploiting Intra-Conversation Variability for Speaker Diarization[C]// Proceedings of the llth Annual International Speech Communication Association. Florence, Italy: IEEE Press, 2011: 945-948.
  • 9GlembekO, Burget L, Matejka P, et al. Simplification and optimization of i-vector extraction [C]// International Conference on Acoustics, Speech and Signal Processing. Brno, Czech: IEEE Press, 2011: 4516-4519.
  • 10Prince S J D, Eider J E. Probabilistic linear discriminant analysis for inferences about identity[C]// llth International Conference on Computer Vision. Rio de Janeiro, Brazil: IEEE Press, 2007: 1-8.

同被引文献38

引证文献7

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部