基于MFCC和运动强度聚类初始化的多说话人识别被引量：10

Multi-speaker recognition based on MFCC and motion intensity clustering initialization

下载PDF

导出

摘要针对常用基于音频特征的多说话人聚类初始化方法精度不高这一问题,提出了一种基于视频信号的新方法。该方法通过运用每一时间帧视频信号的运动强度特征对聚类初始化阶段的初始话者类进行选择,有效提升了说话人初始类纯度。最后将该方法应用到高斯混合模型(GMM)多说话人识别系统。实验结果表明,在整个会议集上该方法相比其他方法有了很大改善,较之线性初始化系统的错误识别率平均降低了19.436%,较之改进的线性初始化系统的错误识别率平均降低了16.618%。 Aiming at the problem of conventional initialization methods performed on audio feature of multiple speakers clustering with poor accuracy,this paper proposed a new method visual-based feature.The method used motion intensity feature with each time-frame of visual information to find initial speaker cluster during the process of clustering initialization,and promoted the purity of speaker initial cluster effectively.Finally,applied this method to Gaussian mixture model（GMM） multi-speaker recognition system.And the experimental results show that,across the entire meeting set,this proposed new method achieved consistent improvements over other methods,and compared to linear initialization it makes the error recognition of system been reduced by 19.436% on average;16.618% to the improved linear initialization

作者曹洁余丽珍

机构地区兰州理工大学计算机与通信学院兰州理工大学电气工程与信息工程学院

出处《计算机应用研究》 CSCD 北大核心 2012年第9期3295-3298,共4页 Application Research of Computers

基金甘肃省自然科学基金资助项目(1014ZSB064) 甘肃省财政厅资助项目(0914ZTB148)

关键词多说话人识别聚类初始化运动强度特征运动强度初始化 multi-speaker recognition clustering initialization motion intensity feature motion intensities initialization

分类号 TP391.4 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献13

1WOOTERS C,HUUBREGTS M. The ICSI RT07s speaker diarization system [ C ] //Proc of Multimodal Technologies for Perception of Humans, 2008 :509-519.
2HUNG H,HUANG Yan,FRIEDLAND G ’ et al. Estimating the dominant person in multi-party conversations using speaker diarization strategies [ C ] //Proc of International Conference on Acoustics, Speech and Signal Processing. 2008:2197-2200.
3HUNG H,HUANG Yan, FRIEDLAND,et al. Estimating dominance in multi-party meetings using speaker diarization [ J ] . IEEE Trans on Audio, Speech and Language Processing, 2010,19 (4) : 847-860.
4NOULAS A K,ENGLEBINNE G,KROSE B J A. Multi-modal speaker diarisation[J]. IEEE Trans on Pattern Analysis and Machine Intelligence ,2012,34(1) :79-93.
5HUNG H,JAYAGOPI D,YEO C,et al. Using audio and video features to classify the most dominant person in a group meeting[ C ] //Proc of the 15th International Conference on Multimedia. New York: ACMPress,2007:835-838.
6曹洁,余丽珍.改进的说话人聚类初始化和GMM的多说话人识别[J].计算机应用研究,2012,29(2):590-593. 被引量：5
7ANGUERA X,WOOTERS G,HERNANDO J. Friends and enemies;a novel Initialization for speaker diarization [ C ] //Proc of the 9th International Conference on Spoken Language. 2006 : 689-692.
8KOH E C,SUN Han-wu,NWE T L,et al. Speaker diarization using direction of arrival estimate and acoustic feature information[ C]//Proc of Multimodal Technologies for Perception of Humans. Berlin : Springer-Verlag 2007:484-496.
9LUQUE J( SEGURA C, HERNANDO. Clustering initialization based on spatial information for speaker diarization of meetings [ C ] //Proc of the 9th Annual Conference of the International Speech Communication Association. 2008 :383-386.
10GARAU G,BA S,BOURLARD H,e( al. Investigating the use of visual focus of attention for audio-visual speaker diarisation [ C ] //Proc of the 17th ACM International Conference on Multimedia. New York : ACM Press,2009:681-684.

二级参考文献11

1邓菁.电话信道下多说话人识别研究[D].北京:清华大学,2007.
2WOOTERS C, HUIJBREGTS M. The ICSI RT07s speaker diarization system[ J]. Multimodal Technologies for Perception of Humans, 2008,4625:509-519.
3GARAU G,BOURLARD H. Using audio and visual cues for speaker diarisation initialization [ C ]//Proc of International Conference on Acoustics, Speech and Signal Processing. [ S. 1. ] :IEEE Signal Pro- cessin~ Society,2010:4942-4945.
4HUNG H,HUANG Yan, FRIEDLAND G, et al. Estimating the dom- inant person in multi-party conversations using speaker diarization strategies [ C ]//Proc of International Conference on Acoustics, Speech and Signal Processing. [ S. 1. ] : IEEE Press,2008:2197-2200.
5FRIEDLAND G, HUNG H, YEO C. Multi-modal speaker diarization of real-world meetings using compressed-domain video features[ C ]/! Proc of International Conference on Audio, Speech and Signal Proces- sing. [ S. 1. ] :IEEE Press,2009:4069-4072.
6HUNG H, FRIEDLAND G. Towards audio-visual on-line diarization of participants in group meetings[ C ]//Proc of Workshop on Multi-camera and Multi-modal Sensor Fusion Algorithms and Applications. Mar- seille : European Conference on Computer Vision,2008 : 1-12.
7HUNG H, HUANG Yan, FRIEDLAND G, et al. Estimating domi- nance in multi-party meetings using speaker diarization [ J ]. IEEE Yrans on Audio, Speech and Language Processing, 2010, 19 (4) :84?-860.
8NOULAS A, ENGLEBIENNE G, KROSE B. Multi-modal speaker di- arisation[ J]. IEEE Trans on Pattern Analysis and Machine In- telligence,2011,34( 1 ) :79-93.
9GARAU G, DIELMANN A, BOURLARD H. Audio-visual synchroni- sation for speaker diarisation [ C ]//Proc of International Conference on Speech and Language Processing. Makuhari, Chiba: [ s. n. ] , 2010:2654-2657.
10PARDO J,XNGUERA X, WOOTERS C. Speaker diarization for mul- tiple-distant-microphone meetings using several sources of information [J]. IEEE Trans on Computers,2007,56(9) :1212-1224.

共引文献4

1艾佳琪,左毅,刘君霞,贺培超,李铁山,陈俊龙.基于余弦相似度的动态语音特征提取算法[J].计算机应用研究,2020,37(S02):147-149. 被引量：9
2汪洋,甘涛,向军.广播电视新闻中的主持人跟踪系统[J].计算机系统应用,2014,23(10):40-45.
3刘雪燕,李明,袁宝玲.基于CFCC-PCA的说话人辨识方法[J].成都工业学院学报,2015,18(2):32-34.
4雷磊,佘堃.基于小波倒谱系数和概率神经网络的取证说话人识别模型[J].计算机应用研究,2018,35(4):978-981. 被引量：3

同被引文献57

1雷明,韩崇昭,肖梅.扩展卡尔曼粒子滤波算法的一种修正方法[J].西安交通大学学报,2005,39(8):824-827. 被引量：9
2邱政权,尹俊勋,薛丽萍.基于DWT-TEO的说话人识别[J].自动化学报,2006,32(5):753-759. 被引量：5
3牛强,王志晓,陈岱,夏士雄.基于SVM的中文网页分类方法的研究[J].计算机工程与设计,2007,28(8):1893-1895. 被引量：22
4王欣,罗代升,王正勇.基于改进谱减算法的语音增强研究[J].成都信息工程学院学报,2007,22(2):201-204. 被引量：12
5王波,徐毅琼,李弼程.基于SVM的多分类器融合算法在说话人识别中的应用[J].计算机工程与设计,2007,28(12):2909-2910. 被引量：5
6张雄伟.现代语音处理技术及应用[M]北京:机械工业出版社,2003.
7赵力.语音信号处理[M]北京:机械工业出版社,2003.
8I.Katunobu,Y.Mikio,T.Kazuya,M.Tatsuo,K.Tetsunori,S.Kiyohiro,and I.Shuichi. JNAS:Japanese speechcorpus for largevocabulary continuousspeechre cognitionre search[J].Journal of the Acoustical Society of Japan(E),1999,(03):119-206.
9I.Katunobu,Y.Mikio,T.Kazuya,M.Tatsuo,K.Tetsunori,S.Kiyohiro,I.Shuichi.JNAS:Japanesespeechcorpusf orlargevocabularycontinuousspeechrecognitionresearch[].JournaloftheAcousticalSocietyofJapan(E).1999
10Stolcke A, Kajarekar S, Ferrer L, et al. Speaker recognition with ses- sion variability normalization based on MLLR adaptation transforms [ J ]. IEEE Trans. On Audio, Speech, and Laguage Processing, 2007,15(7) :1987 - 1998.

引证文献10

1田莎莎,唐菀,佘纬.改进MFCC参数在非特定人语音识别中的研究[J].科技通报,2013,29(3):139-142. 被引量：15
2孙一鸣,刘葳.基于HTK的日语连续语音识别系统的建立与研究[J].计算机光盘软件与应用,2013,16(16):192-193. 被引量：1
3孙一鸣,刘葳.基于HTK的日语连续语音识别系统的建立与研究[J].计算机光盘软件与应用,2013,16(21):86-87. 被引量：1
4田秀华,刘红光.基于类内类间距离的说话人特征优化[J].计算机应用与软件,2015,32(11):151-153.
5张校非,白艳萍.基于改进的PSO-SVM的音频信号特征识别和分类[J].数学的实践与认识,2017,47(1):135-142. 被引量：4
6欧国振,孙林慧,薛海双.基于重组超矢量的GMM-SVM说话人辨认系统[J].计算机技术与发展,2017,27(7):51-56. 被引量：3
7余玲飞,刘强.基于深度循环网络的声纹识别方法研究及应用[J].计算机应用研究,2019,36(1):153-158. 被引量：27
8曹洁,黄开杰,王进花.基于GPU加速的粒子滤波多说话人跟踪算法及其应用[J].计算机应用研究,2019,36(3):796-800. 被引量：1
9袁刚,李廷华,蒋友文,焦韬.一种基于MFCC与PCA的改进型语音识别算法[J].南阳理工学院学报,2015,7(6):56-60.
10肖军.人工智能背景下公安反恐多技术融合模型的构建与运用[J].中国刑警学院学报,2019(4):12-17. 被引量：1

二级引证文献52

1艾佳琪,左毅,刘君霞,贺培超,李铁山,陈俊龙.基于余弦相似度的动态语音特征提取算法[J].计算机应用研究,2020,37(S02):147-149. 被引量：9
2崔琳,王芷悦.基于LFBank与FBank混合特征的声纹识别研究[J].计算机科学,2022,49(S02):621-625. 被引量：2
3杜文龙.一种提高语音特征参数稳健性MLMCC算法的研究[J].智能计算机与应用,2014,4(4):94-96.
4张建英,刘学航,冯翔.园林生态古镇遥感图像特征信息灰阶量化分析[J].科技通报,2014,30(8):212-214. 被引量：1
5华斌,张丽超,赵富强.基于加权MFCC的音频检索[J].计算机工程与应用,2015,51(8):200-204. 被引量：7
6邹东伯,刘海,赵亮,康迎杰.分布式光纤振动传感信号识别的研究[J].激光技术,2016,40(1):86-89. 被引量：27
7宣传忠,马彦华,武佩,张丽娜,郝敏,张曦宇.基于声信号特征加权的设施养殖羊行为分类识别[J].农业工程学报,2016,32(19):195-202. 被引量：18
8王丰华,王邵菁,陈颂,袁国刚,张君.基于改进MFCC和VQ的变压器声纹识别模型[J].中国电机工程学报,2017,37(5):1535-1542. 被引量：77
9李荣华,赵征鹏.衡阳方言孤立词识别研究[J].计算机系统应用,2017,26(5):247-252.
10刘利波,张文明.基于智能蚁群算法的移动机器人轨迹规划[J].机械与电子,2017,35(11):62-64. 被引量：2

1曹洁,余丽珍.改进的说话人聚类初始化和GMM的多说话人识别[J].计算机应用研究,2012,29(2):590-593. 被引量：5
2李新良.数据挖掘中聚类初始化方法的优化研究[J].计算技术与自动化,2008,27(2):130-133. 被引量：1
3陈世刚,马小虎.基于多高斯肤色分割和Haar-like强度特征的人脸检测[J].苏州大学学报（自然科学版）,2011,27(3):30-34. 被引量：3
4王守觉,曲延锋,李卫军,覃鸿.基于仿生模式识别与传统模式识别的人脸识别效果比较研究[J].电子学报,2004,32(7):1057-1061. 被引量：47
5陈宁凡,蔡利栋.一种基于聚类的消失点自动测量方法[J].中国体视学与图像分析,2006,11(1):49-52. 被引量：4
6李莹,魏合章.基于边缘强度特征的景像匹配算法比较分析[J].战术导弹技术,2010(2):59-63. 被引量：3
7徐潇潇,谢林柏,彭力.基于WiFi信号强度特征的室内定位系统设计[J].计算机工程,2015,41(4):87-91. 被引量：31
8潘章明.基于KD树子样的聚类初始化算法[J].计算机系统应用,2011,20(1):80-83.
9安装Windows XP后的五个设置技巧[J].大众电脑,2003(5):96-96.
10安装Windows XP后的五个设置技巧[J].大众电脑,2005(4):86-87.

计算机应用研究

2012年第9期

浏览历史

内容加载中请稍等...

基于MFCC和运动强度聚类初始化的多说话人识别被引量：10

参考文献13

二级参考文献11

共引文献4

同被引文献57

引证文献10

二级引证文献52

相关作者

相关机构

相关主题

浏览历史

基于MFCC和运动强度聚类初始化的多说话人识别 被引量：10

参考文献13

二级参考文献11

共引文献4

同被引文献57

引证文献10

二级引证文献52

相关作者

相关机构

相关主题

浏览历史

基于MFCC和运动强度聚类初始化的多说话人识别被引量：10