摘要
针对常用基于音频特征的多说话人聚类初始化方法精度不高这一问题,提出了一种基于视频信号的新方法。该方法通过运用每一时间帧视频信号的运动强度特征对聚类初始化阶段的初始话者类进行选择,有效提升了说话人初始类纯度。最后将该方法应用到高斯混合模型(GMM)多说话人识别系统。实验结果表明,在整个会议集上该方法相比其他方法有了很大改善,较之线性初始化系统的错误识别率平均降低了19.436%,较之改进的线性初始化系统的错误识别率平均降低了16.618%。
Aiming at the problem of conventional initialization methods performed on audio feature of multiple speakers clustering with poor accuracy,this paper proposed a new method visual-based feature.The method used motion intensity feature with each time-frame of visual information to find initial speaker cluster during the process of clustering initialization,and promoted the purity of speaker initial cluster effectively.Finally,applied this method to Gaussian mixture model(GMM) multi-speaker recognition system.And the experimental results show that,across the entire meeting set,this proposed new method achieved consistent improvements over other methods,and compared to linear initialization it makes the error recognition of system been reduced by 19.436% on average;16.618% to the improved linear initialization
出处
《计算机应用研究》
CSCD
北大核心
2012年第9期3295-3298,共4页
Application Research of Computers
基金
甘肃省自然科学基金资助项目(1014ZSB064)
甘肃省财政厅资助项目(0914ZTB148)
关键词
多说话人识别
聚类初始化
运动强度特征
运动强度初始化
multi-speaker recognition
clustering initialization
motion intensity feature
motion intensities initialization