摘要
该文提出一种改进的基于隐Markov模型(HMM)和Bayes信息准则(BIC)的说话人日志系统。它用来检测会议语音数据中"谁在什么时候说话"。在对说话人模型进行Gauss混合模型(GMM)建模的时候,考虑到用来建模的数据通常会比较短,首先训练一个通用背景模型,然后用最大后验概率(MAP)准则得到相应片段的模型。在NIST 2004年举办的说话人日志评测任务数据集RT-04S上的实验结果表明:该系统与国际主流系统相比有一定的优势。
A speaker diarization system was developed based on the popular hidden Markov model(HMM) and Bayes information criterion(BIC) framework to detect "who spoke when".Speaker models using Gaussian mixture models(GMM) usually fail because the segments used for the GMM training are too short for accurate modeling.A universal background model(UBM) was trained using all of the meeting data,and then a maximum a posterior(MAP) criterion was used to estimate the speaker's model based on the UBM.The system outperforms a state-of-the-art system on the National Institute of Standards and Technology(NIST) rich transcription(RT) 2004 spring speaker diarization evaluation.
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2011年第9期1267-1270,1275,共5页
Journal of Tsinghua University(Science and Technology)
基金
国家科技支撑计划(2008BAI50B03)
国家自然科学基金面上项目(10874203
60875014
61072124
11074275)