摘要
传统的以贝叶斯信息准则(Bayesian information criterion,BIC)作为相似性度量的说话人分离技术,在短时对话的分离任务中能取得较好的效果,但是随着对话时长的增加,BIC的单高斯模型不足以描述不同说话人数据的分布,且层次聚类(Hierarchical agglomerative clustering,HAC)时,区分相同说话人和不同说话人的门限值难以划定.针对此问题,提出基于短时BIC和长时G_PLDA的融合方法,充分利用BIC在短时聚类的可靠性和G_PLDA在长时段上的优异区分性,在美国国家标准技术局(NIST)08Summed测试集上的实验表明,该方法将分类错误率(DER)从BIC基线系统的2.34%降到1.54%,性能相对提升34.2%.
The traditional technology for speaker diarization(SD), which exploits the Bayesian iniormauon criterion(BIC) as the similarity metric, can obtain good results in the short dialogue task, but with the length of the dialogue increasing , single Gaussian model of BIC is insufficient to describe the information distribution of different speakers. Moreover, it is difficult to delineate the threshold between the same speakers and different speakers when using hierarchical clustering (HAC). To solve this problem, a fusion method between BIC and G_PLDA was proposed, so as to make full use of the reliability of BIC in short- term clustering and the excellent discriminating power of G_PLDA in long utterancs. A set of experiments based on NIST 08 Summed shows that this new fusion method reduces the diariazation error rate (DER) from 2.34 ~ of BIC baseline system to 1.54 ~, improving performance of speaker diarization by 34.2 ~.