改进的说话人聚类初始化和GMM的多说话人识别被引量：5

Improved speaker clustering initialization and GMM multi-speaker recognition

下载PDF

导出

摘要针对多说话人聚类线性初始化方法精度较差的问题,提出了一种改进的聚类初始化方法。该方法引入BIC对由线性初始化产生的初始类进行检测分割,有效提升了说话人初始类纯度。最后将该方法应用到高斯混合模型(GMM)多说话人识别系统。实验结果表明,所提方法使说话人平均类纯度(ACP)提高了48.51%,系统的错误识别率平均降低12.09%。 Aiming at the problem of the linear initialization method of multiple speaker clustering with poor accuracy,this paper proposed an improved method of clustering initialization.The method by introducing BIC to detect and segment for initial cluster produced by the linear initialization,and promoted the purity of speaker initial cluster effectively.Finally,applied the method to Gaussian mixture model（GMM） multi-speaker recognition system.And the experimental results show that this proposed method makes the average cluster purity（ACP） have been increased by 48.51%,and the error recognition of system have been reduced by 12.09% on average.

作者曹洁余丽珍

机构地区兰州理工大学计算机与通信学院兰州理工大学电气工程与信息工程学院

出处《计算机应用研究》 CSCD 北大核心 2012年第2期590-593,共4页 Application Research of Computers

基金甘肃省财政厅资助项目(0914ZTB148) 甘肃省自然科学基金资助项目(1014ZSB064)

关键词多说话人识别改进的聚类初始化高斯混合模型平均类纯度 multi-speaker recognition improved clustering initialization Gaussian mixture model average cluster purity

分类号 TP391.4 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献11

1邓菁.电话信道下多说话人识别研究[D].北京:清华大学,2007.
2WOOTERS C, HUIJBREGTS M. The ICSI RT07s speaker diarization system[ J]. Multimodal Technologies for Perception of Humans, 2008,4625:509-519.
3GARAU G,BOURLARD H. Using audio and visual cues for speaker diarisation initialization [ C ]//Proc of International Conference on Acoustics, Speech and Signal Processing. [ S. 1. ] :IEEE Signal Pro- cessin~ Society,2010:4942-4945.
4HUNG H,HUANG Yan, FRIEDLAND G, et al. Estimating the dom- inant person in multi-party conversations using speaker diarization strategies [ C ]//Proc of International Conference on Acoustics, Speech and Signal Processing. [ S. 1. ] : IEEE Press,2008:2197-2200.
5赵晖,顾亚强,唐朝京.基于乘积HMM的双模态语音识别方法[J].计算机工程,2010,36(8):7-9. 被引量：8
6FRIEDLAND G, HUNG H, YEO C. Multi-modal speaker diarization of real-world meetings using compressed-domain video features[ C ]/! Proc of International Conference on Audio, Speech and Signal Proces- sing. [ S. 1. ] :IEEE Press,2009:4069-4072.
7HUNG H, FRIEDLAND G. Towards audio-visual on-line diarization of participants in group meetings[ C ]//Proc of Workshop on Multi-camera and Multi-modal Sensor Fusion Algorithms and Applications. Mar- seille : European Conference on Computer Vision,2008 : 1-12.
8HUNG H, HUANG Yan, FRIEDLAND G, et al. Estimating domi- nance in multi-party meetings using speaker diarization [ J ]. IEEE Yrans on Audio, Speech and Language Processing, 2010, 19 (4) :84?-860.
9NOULAS A, ENGLEBIENNE G, KROSE B. Multi-modal speaker di- arisation[ J]. IEEE Trans on Pattern Analysis and Machine In- telligence,2011,34( 1 ) :79-93.
10GARAU G, DIELMANN A, BOURLARD H. Audio-visual synchroni- sation for speaker diarisation [ C ]//Proc of International Conference on Speech and Language Processing. Makuhari, Chiba: [ s. n. ] , 2010:2654-2657.

二级参考文献7

1Kumatani K,Nakamura S,Shikano K.An Adaptive Integration Based on Product HMM for Audio-visual Speech Recognition[C]// Proceedings of IEEE ICME'01.Tokyo,Japan:[s.n.],2001:1020-1023.
2Lee J S,Park C H.Robust Audio-visual Speech Recognition Based on Late Integration[J].IEEE Transactions on Multimedia,2008,10(5):767-779.
3Dupont S,Luettin J.Audio-visual Speech Modeling for Continuous Speech Recognition[J].IEEE Transactions on Multimedia,2000,2(3):141-151.
4Zhao Hui,Tang Chaojing,Yu Tao.Fast Thresholding Segmentation for Image with High Noise[C]//Proceedings of ICIA'08.Zhangjiajie,China:[s.n.],2008:290-295.
5Rabiner L R.A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition[J].Proceedings of the IEEE,1989,77(2):257-286.
6Bregler C,Omohundro S M.Nonlinear Manifold Learning for Visual Speech Recognition[C]//Proc.of IEEE Int'l Conf.on Computer Vision.Piscataway,NJ,USA:[s.n.],1995:494-499.
7谢磊,蒋冬梅,Ilse Ravyse,赵荣椿,Hichem Sahli,Werner Verhelst,Jan Cornelis.双模型语音识别中的听视觉合成和模型同步异步性实验研究[J].西北工业大学学报,2004,22(2):171-175. 被引量：3

共引文献10

1王刚,郑方.电话信道下应用DMFCC进行说话人识别[J].清华大学学报（自然科学版）,2009(10):1597-1600. 被引量：4
2李勇,李应,余清清.基于流形学习和SVM的环境声音分类[J].计算机工程,2011,37(7):288-290. 被引量：1
3王刚,邬晓钧,郑方,王琳琳,张陈昊.基于参考说话人模型和双层结构的说话人辨认[J].清华大学学报（自然科学版）,2011,51(9):1261-1266. 被引量：1
4梁吉光,田俊华,姜杰.基于改进HMM的文本信息抽取模型[J].计算机工程,2011,37(20):178-179. 被引量：9
5秦宇强,张雪英.语音情感中基于ZCPA的VAP模型[J].计算机工程,2012,38(2):169-171. 被引量：2
6吕兰兰,蒋冬梅,王风娜,Hichem Sahli,Werner Verhelst.基于三流DBN模型的听视觉情感识别[J].计算机工程,2012,38(5):161-162. 被引量：1
7李冠宇,孟猛.藏语拉萨话大词表连续语音识别声学模型研究[J].计算机工程,2012,38(5):189-191. 被引量：16
8李鉴,李杰.基于临界小波参数和新序列核支持向量机的说话人识别[J].信阳师范学院学报（自然科学版）,2012,25(3):398-401. 被引量：1
9赵立辉,毛竹,霍春宝,杨红喆.基于GMM-SVM的说话人识别系统研究[J].工矿自动化,2014,40(5):49-53. 被引量：7
10李欣怡,张志超.语音驱动的人脸动画研究现状综述[J].计算机工程与应用,2017,53(22):21-28. 被引量：4

同被引文献44

1WOOTERS C,HUUBREGTS M. The ICSI RT07s speaker diarization system [ C ] //Proc of Multimodal Technologies for Perception of Humans, 2008 :509-519.
2HUNG H,HUANG Yan,FRIEDLAND G ’ et al. Estimating the dominant person in multi-party conversations using speaker diarization strategies [ C ] //Proc of International Conference on Acoustics, Speech and Signal Processing. 2008:2197-2200.
3HUNG H,HUANG Yan, FRIEDLAND,et al. Estimating dominance in multi-party meetings using speaker diarization [ J ] . IEEE Trans on Audio, Speech and Language Processing, 2010,19 (4) : 847-860.
4NOULAS A K,ENGLEBINNE G,KROSE B J A. Multi-modal speaker diarisation[J]. IEEE Trans on Pattern Analysis and Machine Intelligence ,2012,34(1) :79-93.
5HUNG H,JAYAGOPI D,YEO C,et al. Using audio and video features to classify the most dominant person in a group meeting[ C ] //Proc of the 15th International Conference on Multimedia. New York: ACMPress,2007:835-838.
6ANGUERA X,WOOTERS G,HERNANDO J. Friends and enemies;a novel Initialization for speaker diarization [ C ] //Proc of the 9th International Conference on Spoken Language. 2006 : 689-692.
7KOH E C,SUN Han-wu,NWE T L,et al. Speaker diarization using direction of arrival estimate and acoustic feature information[ C]//Proc of Multimodal Technologies for Perception of Humans. Berlin : Springer-Verlag 2007:484-496.
8LUQUE J( SEGURA C, HERNANDO. Clustering initialization based on spatial information for speaker diarization of meetings [ C ] //Proc of the 9th Annual Conference of the International Speech Communication Association. 2008 :383-386.
9GARAU G,BA S,BOURLARD H,e( al. Investigating the use of visual focus of attention for audio-visual speaker diarisation [ C ] //Proc of the 17th ACM International Conference on Multimedia. New York : ACM Press,2009:681-684.
10ZOBL M,WALLHOFF F, RIGOLL G. Action recognition in meeting scenarios using global motion features[ C]//Proc of the 4th IEEE International Workshop on Performance Evaluation of Tracking and Surveillance. 2003:32-36.

引证文献5

1艾佳琪,左毅,刘君霞,贺培超,李铁山,陈俊龙.基于余弦相似度的动态语音特征提取算法[J].计算机应用研究,2020,37(S02):147-149. 被引量：9
2曹洁,余丽珍.基于MFCC和运动强度聚类初始化的多说话人识别[J].计算机应用研究,2012,29(9):3295-3298. 被引量：10
3汪洋,甘涛,向军.广播电视新闻中的主持人跟踪系统[J].计算机系统应用,2014,23(10):40-45.
4刘雪燕,李明,袁宝玲.基于CFCC-PCA的说话人辨识方法[J].成都工业学院学报,2015,18(2):32-34.
5雷磊,佘堃.基于小波倒谱系数和概率神经网络的取证说话人识别模型[J].计算机应用研究,2018,35(4):978-981. 被引量：3

二级引证文献22

1崔琳,王芷悦.基于LFBank与FBank混合特征的声纹识别研究[J].计算机科学,2022,49(S02):621-625. 被引量：2
2田莎莎,唐菀,佘纬.改进MFCC参数在非特定人语音识别中的研究[J].科技通报,2013,29(3):139-142. 被引量：15
3孙一鸣,刘葳.基于HTK的日语连续语音识别系统的建立与研究[J].计算机光盘软件与应用,2013,16(16):192-193. 被引量：1
4孙一鸣,刘葳.基于HTK的日语连续语音识别系统的建立与研究[J].计算机光盘软件与应用,2013,16(21):86-87. 被引量：1
5田秀华,刘红光.基于类内类间距离的说话人特征优化[J].计算机应用与软件,2015,32(11):151-153.
6张校非,白艳萍.基于改进的PSO-SVM的音频信号特征识别和分类[J].数学的实践与认识,2017,47(1):135-142. 被引量：4
7欧国振,孙林慧,薛海双.基于重组超矢量的GMM-SVM说话人辨认系统[J].计算机技术与发展,2017,27(7):51-56. 被引量：3
8黄玉钏,王俞,李振平.基于概率神经网络图像识别的工业机器人控制[J].计算机应用,2018,38(A02):63-66. 被引量：2
9吴垒,于哲舟.基于概率神经网络的在线分类器[J].计算机技术与发展,2019,29(12):14-20. 被引量：6
10庄启康,陈仕龙,王泽超,蔡潇,毕贵红.基于MEEMD和SVM的贯通式同相牵引直接供电系统牵引网故障识别[J].电瓷避雷器,2021(3):67-74. 被引量：5

1张素敏,苏东林,王炜.改进的基于决策树的说话人在线聚类[J].光学精密工程,2010,18(1):227-233. 被引量：1
2曹洁,余丽珍.基于MFCC和运动强度聚类初始化的多说话人识别[J].计算机应用研究,2012,29(9):3295-3298. 被引量：10
3王炜,吕萍,颜永红.一种改进的基于层次聚类的说话人自动聚类算法[J].声学学报,2008,33(1):9-14. 被引量：4
4王守觉,曲延锋,李卫军,覃鸿.基于仿生模式识别与传统模式识别的人脸识别效果比较研究[J].电子学报,2004,32(7):1057-1061. 被引量：46
5李新良.数据挖掘中聚类初始化方法的优化研究[J].计算技术与自动化,2008,27(2):130-133. 被引量：1
6龙志祎,程葳.基于词聚类的热点话题检测算法[J].计算机工程与设计,2011,32(6):2214-2216. 被引量：27
7陈宁凡,蔡利栋.一种基于聚类的消失点自动测量方法[J].中国体视学与图像分析,2006,11(1):49-52. 被引量：4
8潘章明.基于KD树子样的聚类初始化算法[J].计算机系统应用,2011,20(1):80-83.
9肖述才,欧智坚,王作英.语音识别中的一种说话人聚类算法[J].中文信息学报,2005,19(4):84-88. 被引量：4
10曹婧华,冉彦中,张玲.一种基于共生矩阵的多分辨率纹理分割算法[J].长春理工大学学报（自然科学版）,2011,34(3):132-134.

计算机应用研究

2012年第2期

浏览历史

内容加载中请稍等...

改进的说话人聚类初始化和GMM的多说话人识别被引量：5

参考文献11

二级参考文献7

共引文献10

同被引文献44

引证文献5

二级引证文献22

相关作者

相关机构

相关主题

浏览历史

改进的说话人聚类初始化和GMM的多说话人识别 被引量：5

参考文献11

二级参考文献7

共引文献10

同被引文献44

引证文献5

二级引证文献22

相关作者

相关机构

相关主题

浏览历史

改进的说话人聚类初始化和GMM的多说话人识别被引量：5