期刊文献+

基于HMM-BIC的说话人日志系统 被引量:4

Speaker diarization system based on HMM-BIC
原文传递
导出
摘要 该文提出一种改进的基于隐Markov模型(HMM)和Bayes信息准则(BIC)的说话人日志系统。它用来检测会议语音数据中"谁在什么时候说话"。在对说话人模型进行Gauss混合模型(GMM)建模的时候,考虑到用来建模的数据通常会比较短,首先训练一个通用背景模型,然后用最大后验概率(MAP)准则得到相应片段的模型。在NIST 2004年举办的说话人日志评测任务数据集RT-04S上的实验结果表明:该系统与国际主流系统相比有一定的优势。 A speaker diarization system was developed based on the popular hidden Markov model(HMM) and Bayes information criterion(BIC) framework to detect "who spoke when".Speaker models using Gaussian mixture models(GMM) usually fail because the segments used for the GMM training are too short for accurate modeling.A universal background model(UBM) was trained using all of the meeting data,and then a maximum a posterior(MAP) criterion was used to estimate the speaker's model based on the UBM.The system outperforms a state-of-the-art system on the National Institute of Standards and Technology(NIST) rich transcription(RT) 2004 spring speaker diarization evaluation.
出处 《清华大学学报(自然科学版)》 EI CAS CSCD 北大核心 2011年第9期1267-1270,1275,共5页 Journal of Tsinghua University(Science and Technology)
基金 国家科技支撑计划(2008BAI50B03) 国家自然科学基金面上项目(10874203 60875014 61072124 11074275)
关键词 说话人日志 最大后验概率 隐MARKOV模型 Bayes信息准则 speaker diarization maximum a posterior(MAP) hidden Markov model(HMM) Bayes information criterion(BIC)
  • 相关文献

参考文献15

  • 1Barras C, Zhu X, Meignier S, et al. Multi stage speaker diarizalion of broadcast news[J].IEEE Transactions on Audio, Speech and Language Processing, 2006, 14(5): 1505 - 1512.
  • 2Deleglise P, Esteve Y, Meignier S, et al. Improvements to the LIUM French ASR system based on CMU Sphinx: what helps to significantly reduce the word error rate? [C]// Interspeech. Brighton, NJ:ISCA, 2009:2123-2126.
  • 3Pardo J L, Anguera X, Wooters X, Speaker diarization for multiple distant microphone meetings using several sources of information [J].IEEE Transactions on Computers, 2007, 56(9) : 1214 - 1224.
  • 4Nguyen H T, Chng E, Li H Z. T-test distance and clustering criterion for speaker diarization [C]//Interspeech. Brisbane, NI, ISCA, 2008, 36-39.
  • 5Meignier S, Moraru D, Fredouille C, et al. Step-by-step and integrated approaches in broadcast news speaker diarization[J]. Computer Speech and Language, 2006, 20(2-3): 303- 330.
  • 6NIST. Rich Transcription Evaluation Project [EB/OL]. [2011-06-01]. http: //www. itl. nist. gov/iad/mig/tests/rt.
  • 7Adami A, Burget L, Dupont S, et al. Qualcomm icsi ogi features for asr [C]// Proc ICSLP. Denver, NJ International Speech Communication Association, 2002, 1: 4-7,.
  • 8Anguera X. Robust acoustic beam former [EB/OL].[2011- 06-01]. http: //www. xavieranguera, com/beamformit.
  • 9Patane G, Marco R. The enhanced LBG algorithm [J]. Neural Networks, 2001, 14(9) : 1219 - 1237.
  • 10Dempster A P, Laird N M, Ruhin D B. Maximum Likelihood from Incomplete Data via the EM Algorithm [J].Journal of the Royal Statistical Society, 1977, 39(1) : 1 - 38.

同被引文献96

  • 1周曦,戴蓓蒨,陈雁翔,李辉.基于纯度和BBN算法的无监督的话者聚类[J].模式识别与人工智能,2005,18(4):486-490. 被引量:2
  • 2付中华,张艳宁.在线无监督说话人检索中稳健的模型自举算法[J].软件学报,2007,18(3):608-616. 被引量:3
  • 3..http://www.itl.nist.gov/iad/mig/tests/rt/,.
  • 4S.E.Tranter,D.A.Reynolds.An overview of automatic speaker diarization systems[J].IEEE Tram on Audio,Speech,and Language for Processing.2006,14(5):1557-1565.
  • 5M.Kotti,V.Moschou,C.Kotropoulos.Speaker segmentation and clustering.Signal Processing 2008(88):1091-1124.
  • 6T.Stafylakis and V.Katsouros.A review of recent advances in speaker diarization with bayesian methods.Speech and Language Technologies[M].InTech pubhshing 2011:217-240.
  • 7X.Anguera,S.Bozonnet,N.Evans,C.Fredouille,G.Friedland,O.Vinyals.Speaker diarization:a review of recent research[J].IEEE Trans on Audio,Speech,and Language for Processing.2012,20(2):356-370.
  • 8J.Ramírez; J.M.G6rriz,J.C.Segura.Voice activity detection.Fundamentals and Speech Recognition System Robustness[M].In M.Grimm and K.Kroschel.Robust Speech Recognition and Understanding.2007:1-22.
  • 9D.Liu and F.Kubala,Fast speaker change detection for broadcast news transcription and indexing[C].In Proc.Eur Conf.Speech Commun Technol,1999(3):1031-1034.
  • 10Nwe,T.L,Sun,H.,Li.,H.,Rahardja,S.,Speaker diarization in meeting audio,In Proc.of ICASSP,2010:4073-4076.

引证文献4

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部