期刊文献+

基于MFCC和运动强度聚类初始化的多说话人识别 被引量:10

Multi-speaker recognition based on MFCC and motion intensity clustering initialization
下载PDF
导出
摘要 针对常用基于音频特征的多说话人聚类初始化方法精度不高这一问题,提出了一种基于视频信号的新方法。该方法通过运用每一时间帧视频信号的运动强度特征对聚类初始化阶段的初始话者类进行选择,有效提升了说话人初始类纯度。最后将该方法应用到高斯混合模型(GMM)多说话人识别系统。实验结果表明,在整个会议集上该方法相比其他方法有了很大改善,较之线性初始化系统的错误识别率平均降低了19.436%,较之改进的线性初始化系统的错误识别率平均降低了16.618%。 Aiming at the problem of conventional initialization methods performed on audio feature of multiple speakers clustering with poor accuracy,this paper proposed a new method visual-based feature.The method used motion intensity feature with each time-frame of visual information to find initial speaker cluster during the process of clustering initialization,and promoted the purity of speaker initial cluster effectively.Finally,applied this method to Gaussian mixture model(GMM) multi-speaker recognition system.And the experimental results show that,across the entire meeting set,this proposed new method achieved consistent improvements over other methods,and compared to linear initialization it makes the error recognition of system been reduced by 19.436% on average;16.618% to the improved linear initialization
作者 曹洁 余丽珍
出处 《计算机应用研究》 CSCD 北大核心 2012年第9期3295-3298,共4页 Application Research of Computers
基金 甘肃省自然科学基金资助项目(1014ZSB064) 甘肃省财政厅资助项目(0914ZTB148)
关键词 多说话人识别 聚类初始化 运动强度特征 运动强度初始化 multi-speaker recognition clustering initialization motion intensity feature motion intensities initialization
  • 相关文献

参考文献13

  • 1WOOTERS C,HUUBREGTS M. The ICSI RT07s speaker diarization system [ C ] //Proc of Multimodal Technologies for Perception of Humans, 2008 :509-519.
  • 2HUNG H,HUANG Yan,FRIEDLAND G ’ et al. Estimating the dominant person in multi-party conversations using speaker diarization strategies [ C ] //Proc of International Conference on Acoustics, Speech and Signal Processing. 2008:2197-2200.
  • 3HUNG H,HUANG Yan, FRIEDLAND,et al. Estimating dominance in multi-party meetings using speaker diarization [ J ] . IEEE Trans on Audio, Speech and Language Processing, 2010,19 (4) : 847-860.
  • 4NOULAS A K,ENGLEBINNE G,KROSE B J A. Multi-modal speaker diarisation[J]. IEEE Trans on Pattern Analysis and Machine Intelligence ,2012,34(1) :79-93.
  • 5HUNG H,JAYAGOPI D,YEO C,et al. Using audio and video features to classify the most dominant person in a group meeting[ C ] //Proc of the 15th International Conference on Multimedia. New York: ACMPress,2007:835-838.
  • 6曹洁,余丽珍.改进的说话人聚类初始化和GMM的多说话人识别[J].计算机应用研究,2012,29(2):590-593. 被引量:5
  • 7ANGUERA X,WOOTERS G,HERNANDO J. Friends and enemies;a novel Initialization for speaker diarization [ C ] //Proc of the 9th International Conference on Spoken Language. 2006 : 689-692.
  • 8KOH E C,SUN Han-wu,NWE T L,et al. Speaker diarization using direction of arrival estimate and acoustic feature information[ C]//Proc of Multimodal Technologies for Perception of Humans. Berlin : Springer-Verlag 2007:484-496.
  • 9LUQUE J( SEGURA C, HERNANDO. Clustering initialization based on spatial information for speaker diarization of meetings [ C ] //Proc of the 9th Annual Conference of the International Speech Communication Association. 2008 :383-386.
  • 10GARAU G,BA S,BOURLARD H,e( al. Investigating the use of visual focus of attention for audio-visual speaker diarisation [ C ] //Proc of the 17th ACM International Conference on Multimedia. New York : ACM Press,2009:681-684.

二级参考文献11

  • 1邓菁.电话信道下多说话人识别研究[D].北京:清华大学,2007.
  • 2WOOTERS C, HUIJBREGTS M. The ICSI RT07s speaker diarization system[ J]. Multimodal Technologies for Perception of Humans, 2008,4625:509-519.
  • 3GARAU G,BOURLARD H. Using audio and visual cues for speaker diarisation initialization [ C ]//Proc of International Conference on Acoustics, Speech and Signal Processing. [ S. 1. ] :IEEE Signal Pro- cessin~ Society,2010:4942-4945.
  • 4HUNG H,HUANG Yan, FRIEDLAND G, et al. Estimating the dom- inant person in multi-party conversations using speaker diarization strategies [ C ]//Proc of International Conference on Acoustics, Speech and Signal Processing. [ S. 1. ] : IEEE Press,2008:2197-2200.
  • 5FRIEDLAND G, HUNG H, YEO C. Multi-modal speaker diarization of real-world meetings using compressed-domain video features[ C ]/! Proc of International Conference on Audio, Speech and Signal Proces- sing. [ S. 1. ] :IEEE Press,2009:4069-4072.
  • 6HUNG H, FRIEDLAND G. Towards audio-visual on-line diarization of participants in group meetings[ C ]//Proc of Workshop on Multi-camera and Multi-modal Sensor Fusion Algorithms and Applications. Mar- seille : European Conference on Computer Vision,2008 : 1-12.
  • 7HUNG H, HUANG Yan, FRIEDLAND G, et al. Estimating domi- nance in multi-party meetings using speaker diarization [ J ]. IEEE Yrans on Audio, Speech and Language Processing, 2010, 19 (4) :84?-860.
  • 8NOULAS A, ENGLEBIENNE G, KROSE B. Multi-modal speaker di- arisation[ J]. IEEE Trans on Pattern Analysis and Machine In- telligence,2011,34( 1 ) :79-93.
  • 9GARAU G, DIELMANN A, BOURLARD H. Audio-visual synchroni- sation for speaker diarisation [ C ]//Proc of International Conference on Speech and Language Processing. Makuhari, Chiba: [ s. n. ] , 2010:2654-2657.
  • 10PARDO J,XNGUERA X, WOOTERS C. Speaker diarization for mul- tiple-distant-microphone meetings using several sources of information [J]. IEEE Trans on Computers,2007,56(9) :1212-1224.

共引文献4

同被引文献57

引证文献10

二级引证文献52

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部