基于特征均值距离的短语音段说话人聚类算法被引量：9

Feature Mean Distance Based Speaker Clustering for Short Speech Segments

下载PDF

导出

摘要该文提出一种基于特征均值距离的短语音段说话人聚类算法。首先,定义特征均值距离用来在特征层而不是模型层刻画两个类之间的相似度;然后,迭代合并特征均值距离最小的两个类,直到任意两类之间的特征均值距离的最小值大于一个自适应门限为止。采用取自两个语音数据库的短于3 s的语音段进行实验测试,结果表明:与基于AHC+BIC的算法相比,F度量值平均提高了5%,运算速度约为以前算法的4.68倍。 An algorithm of speaker clustering is proposed based on Feature Mean Distance（FMD） for short speech segments.First,a distance measure,i.e.FMD,is introduced to represent the similarities between two clusters on the level of feature instead of the level of model.Then,two clusters with the minimum of FMDs are iteratively merged until the minimum of FMDs is larger than an adaptive threshold.Experimental results show average 5% improvements in F measure are obtained in comparison with the AHC＋BIC based algorithm.In addition,the proposed algorithm is 4.68 times faster than the AHC＋BIC based algorithm.

作者李艳雄吴永贺前华

机构地区华南理工大学电子与信息学院

出处《电子与信息学报》 EI CSCD 北大核心 2012年第6期1404-1407,共4页 Journal of Electronics & Information Technology

基金国家自然科学基金(61101160 60972132) 中央高校基本科研业务费专项基金(2011ZM0029) 广东省自然科学基金博士启动项目(10451064101004651)资助课题

关键词语音信号处理说话人聚类特征均值距离短语音段 Speech signal processing Speaker clustering Feature Mean Distance（FMD） Short speech segments

分类号 TN912.3 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献9

1Ostendorf M, Favre B, Grishman R, et al.. Speech segmentation and spoken document processing[J]. IEEE Signal Processing Magazine, 2008, 25(3): 59-69.
2Bouamrane M M and Luz S. Meeting browsing state-of-the- art review[J]. Multimedia Systems, 2007, 12(4-5): 439-457.
3Tur G, Stolcke A, Voss L, et al.. The CALO meeting assistant system[J]. IEEE Transactions on Audio, Speech and Language Processing, 2010, 18(6): 1601-1611.
4Margarita K, Vassiliki M, and Constantine K. Speaker segmentation and clustering[J]. Signal Processing, 2008, 88(5) 1091-1124.
5Xavier A and Jean-Francois B. Fast speaker diarization based on binary keys[C]. International Conference on Acoustics, Speech and Signal Processing, IEEE, Prague, 2011: 4428-4431.
6Imseng D and Friedland G. Tuning-robust initialization methods for speaker diarization[J]. IEEE Transactions on Audio, Speech and Language Processing, 2010, 18(8): 2028-2037.
7Valente F, Motlicek P, and Vijayasenan D. Variational Bayesian speaker diarization of meeting recordings[C]. International Conference on Acoustics, Speech and Signal Processing, IEEE, Dallas, 2010: 4954-4957.
8Han K J, Kim S, and Narayanan S S. Robust speaker clustering strategies to data source variation for improved speaker diarization[C]. IEEE Automatic Speech Recognition and Understanding (ASRU) Workshop, Kyoto, 2007: 262-267.
9Li Y X and He Q H. Detecting laughter in spontaneous speech by constructing laughter bouts[J]. International Journal of Speech Technology, 2011, 14(3): 211-225.

同被引文献58

1徐利敏,唐振民,何可可,钱博.说话人识别中基于聚类特征的矢量量化技术[J].计算机工程与应用,2007,43(27):196-198. 被引量：2
2Ostendorf M,Favre B,Grishman R,et al.Speech segmentation and spoken document processing[J].IEEE Signal Processing Magazine,2008,25(3):59-69.
3Tur G,Stolcke A,Voss L,et al.The CALO meeting assistant system[J].IEEE Transactions on Audio,Speech and Language Processing,2010,18(6):1601-1611.
4Miro X A,Bozonnet S,Evans N,et al.Speaker diarization:a review of recent research[J].IEEE Transactions on Audio,Speech and Language Processing,2012,20(2):356-370.
5Valente F,Motlicek P,Vijayasenan D.Variational Bayesian speaker diarization of meeting recordings[C]//International Conference on Acoustics,Speech and Signal Processing,IEEE,Dallas,2010:4954-4957.
6Ning Huazhong,Liu Ming,Tang Hao,et al.A spectral clustering approach to speaker diarization[C]//Proc of the Int Conf on Spoken Language Processing,Pittsburgh,2006:2178-2181.
7Li Zhenguo,Liu Jianzhuang,Chen Shifeng,et al.Noise robust spectral clustering[C]//IEEE 11th International Conference on Digital Object Identifier,2007:1-8.
8李艳雄,徐鑫,贺前华,等.基于说话人分割与聚类的多说话人语速估计方法:中国,201110403577.3[P].2012-07-04.
9Li Yanxiong,He Qianhua,Kwong S,et al.Characteristicsbased effective applause detection for conference speech[J].Signal Processing,2009,89(8):1625-1633.
10Salamin H,Vinciarelli A.Automatic role recognition in multiparty conversations:an approach based on turn or- ganization,prosody,and conditional random fields [J]. IEEE Transactions on Multimedia, 2012,14 (2) : 338-345.

引证文献9

1陈祝允,李艳雄,杜佳媛.基于矢量量化的时序说话人聚类方法[J].科学技术与工程,2014,22(2):41-44. 被引量：5
2吴伟,李艳雄,王梓里,陈祝允.基于语速差异的新闻发布会中首要说话人检测[J].计算机工程与应用,2015,51(4):222-225.
3李威,贺前华,李艳雄.一种多说话人角色聚类方法[J].华南理工大学学报（自然科学版）,2015,43(1):21-27. 被引量：2
4田秀华,刘红光.基于类内类间距离的说话人特征优化[J].计算机应用与软件,2015,32(11):151-153.
5余琨,伍孝金.基于KL散度矩阵迹的潜映射半监督社区发现[J].计算机工程,2017,43(12):296-302.
6赵霞,魏霖静,肖君.考虑内外均衡安全增益的可量化社区隐藏算法[J].计算机应用与软件,2018,35(12):278-284.
7赖松轩,李艳雄.说话人聚类的初始类生成方法[J].计算机工程与应用,2017,53(3):149-153.
8陈吉成,陈鸿昶,于洪涛.基于聚类质量的半监督INMF动态社区检测算法[J].计算机工程,2019,45(10):227-233. 被引量：1
9王杰,王友国,翟其清.语音信号传输过程中的阈上随机共振现象[J].计算机技术与发展,2021,31(2):155-160.

二级引证文献7

1江楠,陈洁,肖潘,唐文强,林志泉.基于声纹识别的电力会议多角色语音的分离和识别研究[J].高电压技术,2023,49(S01):40-46. 被引量：1
2陶佰睿,李青龙,苗凤娟,郭琴,邵慧.码本聚类矢量量化算法在说话人识别中的应用[J].河南科技大学学报（自然科学版）,2016,37(1):35-39. 被引量：4
3叶瑰昀,宁珊,姜艳秋.卷积神经网络优化算法研究[J].齐齐哈尔大学学报（自然科学版）,2016,32(2):27-29.
4刘长征,张磊.语音识别中卷积神经网络优化算法[J].哈尔滨理工大学学报,2016,21(3):34-38. 被引量：20
5赖松轩,李艳雄.说话人聚类的初始类生成方法[J].计算机工程与应用,2017,53(3):149-153.
6薛雷,张弛,张程浩,章依文.汉语儿童言语发育水平自动评估关键技术的研究[J].工业控制计算机,2019,32(7):74-75.
7樊伟,吴定祥,唐立军.触发式弹簧表面缺陷多角度光源补偿检测系统[J].计算机工程与设计,2021,42(4):1173-1180. 被引量：2

1杨绪魁,屈丹,张文林.基于正则化i-Vector算法的语种识别[J].信息工程大学学报,2015,16(2):191-196.
2王新沛,刘常春,白瞳.基于均值距离的图像分割方法[J].山东大学学报（工学版）,2010,40(4):36-41. 被引量：1
3颜荣江,刘强,刘朝晖.TI数字逻辑器件新发展[J].电子技术应用,1998,24(5):65-67.
4杨伟超.物联网信息感知与交互技术分析[J].电脑迷,2016(4):95-96. 被引量：1
5小红.关心下一代[J].中国电信业,2004(9):8-9.
6刘江波,王瑞革,金虎.雷达目标—维距离像识别方法研究[J].舰船电子对抗,2011,34(2):64-68. 被引量：1
7桑佩岩.计算机无线网络的安全研究[J].数码世界,2016,0(4):17-17. 被引量：2
8刘江波,席泽敏,卢建斌,吕建慧.一种舰船目标一维距离像识别的新方法[J].海军工程大学学报,2010,22(1):62-66. 被引量：2
9刘江波.基于相像系数与SVM的雷达辐射信号分选技术[J].电子对抗,2014(2):25-28. 被引量：2
10陆正刚,杨杰,叶晨洲.多雷达传感器数据融合技术与应用[J].上海铁道大学学报,2000,21(6):141-144. 被引量：4

电子与信息学报

2012年第6期

浏览历史

内容加载中请稍等...

基于特征均值距离的短语音段说话人聚类算法被引量：9

参考文献9

同被引文献58

引证文献9

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

基于特征均值距离的短语音段说话人聚类算法 被引量：9

参考文献9

同被引文献58

引证文献9

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

基于特征均值距离的短语音段说话人聚类算法被引量：9