一种多说话人角色聚类方法被引量：2

A Clustering Method for Multiple Speaker Roles

下载PDF

导出

摘要为有效找出会议语音中的说话人角色个数及各角色的说话人语音,提出了一种多说话人角色聚类方法.首先定义说话人角色聚类的特征,然后采用测地距离度量特征的相似度,进而提出了一种利用类内距离来控制类间合并的多说话人角色聚类方法,最后采用4种不同类型的会议语音对该方法进行测试.结果表明:对手工分割和自动分割后的会议语音进行说话人角色聚类时,如果采用相同的聚类方法,则使用测地距离的性能优于使用传统距离的性能;如果采用相同的距离度量方法,则文中方法的性能优于传统层次聚类方法. In order to find the number of speaker roles and the corresponding speakers＇ speech in meeting speeches, a clustering method for multiple speaker roles is proposed. Firstly, features for speaker role clustering are defined. Secondly, geodesic distance is used to measure the similarities among features. Then, inner-class distance is used to control inter-class mergence to form the clustering method. Finally, four different types of meeting speech corpora are used to validate the effectiveness of the proposed method. The results indicate that, for the meeting speeches obtained by both manual and automatic segmentation, the clustering performance using geodesic distance is superior to that using traditional distance when the same clustering algorithm is used in all cases, and that the proposed method performs better than the traditional hierarchical clustering method when the same measuring distance is used.

作者李威贺前华李艳雄

机构地区华南理工大学电子与信息学院

出处《华南理工大学学报（自然科学版）》 EI CAS CSCD 北大核心 2015年第1期21-27,33,共8页 Journal of South China University of Technology(Natural Science Edition)

基金国家自然科学基金资助项目(61101160) 广州市珠江科技新星专项(2013J2200070) 华南理工大学中央高校基本科研业务费专项资金重点项目(2013ZZ0053)~~

关键词说话人角色特征距离度量角色聚类测地距离无监督聚类 speaker role characteristic distance measure role clustering geodesic distance unsupervised clustering

分类号 TN912.3 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献21

1Salamin H,Vinciarelli A.Automatic role recognition in multiparty conversations:an approach based on turn or- ganization,prosody,and conditional random fields [J]. IEEE Transactions on Multimedia, 2012,14 (2) : 338-345.
2Bigot B, Ferrane I, PinquierJ, Andre-Obrecht R.Detecting individual role using features extracted from speaker diarization results [ J ].Multimedia Tools and Applications Archive, 2012,60 (2) : 347-369.
3Barzilay R,Collins M,Hirschberg J, et al.The rules behind roles:identifying speaker role in radio broadcasts [C ]// Proceedings of the Seventeenth National Conference on Artificial Intelligence.Austin Texas: AAAFIAAI, 2000: 679-684.
4Liu Y.Initial study on automatic identification of speaker role in broadcast news speech [C]//Proceedings of Hu- man Language Technology Conference of the North Ame- rican Chapter of the Association of Computational Lin- guistics.Stroudsburg:Association for Computational Lin- guistics, 2006 : 81-84.
5Deng L.A tutorial survey of architectures, algorithms, and applications for deep learning [J ].APSIPA Transactions on Signal and Information Processing, 2014,3 (e2) : 1-29.
6Hinton G E, Salakhutdinov R R.Reducing the dimension- ality of data with neural networks [ J ].Science, 2006,313 (5786) :504-507.
7Tenenbaum J B, de Silva V, Langford J C.A global geome- tric framework for nonlinear dimensionality reduction [J]. Science,2000,290(5500):2319-2323.
8Roweis S T, Saul L K.Nonlinear dimensionality reduction by locally linear embedding [J ].Science,2000,290 (5500) :2323-2326.
9Kothari R,Jain V.Learning from labeled and unlabeled data [C]//Proceedings of the 2002 International Joint Conference on Neural Networks.Piscataway : IEEE, 2002 : 2803-2808.
10Caruana R.Multitask learning:a knowledge-based source of inductive bias [C]//Proceedings of the Tenth Inter- national Conference on Machine Learning.San Francisco: Morgan Kaufmann, 1993 : 41-48.

二级参考文献32

1Margarita Kotti,Luis Gustaro. Automatic speaker segmentation using muhiple feature and distance measure:a comparison of three approaches [ C ]//Proceedings of IEEE International Conference on Multimedia and Expo. Toronto : IEEE ,2006 : 1 101-1 104.
2Amit S Malegaonkar, Aladdin M Ariyaeeinia, Perasiriyan Sivakumaran. Efficient speaker change detection using adapted Gaussian mixture models [ J ]. IEEE Transactions on Audio, Speech and Language Processing, 2007, 15 (6) :1 859-1 869.
3Soonil kwon, Shrikanth Narayanan. Unsupervised speaker indexing using generic models[J].IEEE Transactions on Speech and Audio Processing ,2005,13 ( 5 ) : 1004-1013.
4Chen S S, Gopalakrishnan P S. Speaker, environment and channel change detection and clustering via the Bayesian information criterion [ C ] // Proceedings of DARPA Broadcast News Transeription and Understanding Work- shop. Landowne : [ s. n. ] , 1998 : 127-132.
5Lu Lie, Zhang Hong-jiang. Speaker change detection and tracking in real-time news broadcasting analysis [ C ] // Proceedings of the Tenth ACM International Conference on Multimedia Multimedia. Juan-les-Pins: ACM, 2002: 602-610.
6Delacourt P, Wellekens C J. DISTBIC: a speaker-based segment for audio data indexing [ J ]. Journal of Speech Communication, 2000,32 : 111 -126.
7Du Yun-feng, Hu Wei, Yah Yong-hong, et al. Audio segmentation via tri-model Bayesian information criterion [ C ] //Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Honolulu: IEEE, 2007 : 205- 208.
8Bowen Zhou, John H L Hansen. Efficient audio stream segmentation via the combined T2-statistics and Bayesian information criterion [ J ]. IEEE Transactions on Speech and Audio Processing,2005,13 (4) :467-474.
9Cheng Shih-sian, Wang Hsin-min, Fu Hsin-chia. BICbased audio segmentation by divide-and-conquer [ C ] // Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Las Vegas: IEEE, 2008:4841-4844.
10Tritschler A, Gopinath R. Improved speaker segmentation and segments clustering using the Bayesian Information Criterion [ C ] // Proceedings of Eurospeech. Budapest : [s. n. ] ,1999:679-682.

共引文献17

1何俊,李艳雄,贺前华,李威.变异特征加权的异常语音说话人识别算法[J].华南理工大学学报（自然科学版）,2012,40(3):106-111. 被引量：5
2杨继臣,李艳雄.新闻故事中的关键说话人发现方法[J].计算机工程与设计,2012,33(6):2353-2357. 被引量：1
3杨继臣,何俊,李艳雄.一种基于性别的说话人索引算法[J].计算机工程与科学,2012,34(6):79-82.
4杨继臣,吴裕玲,苏杰华.基于核密度估计的说话人改变检测[J].仲恺农业工程学院学报,2012,25(3):40-41.
5马勇,鲍长春.说话人分割聚类研究进展[J].信号处理,2013,29(9):1190-1199. 被引量：7
6朱大伟,张跃.基于支持向量机的说话人识别[J].南京工程学院学报（自然科学版）,2013,11(4):52-57.
7陈祝允,李艳雄,杜佳媛.基于矢量量化的时序说话人聚类方法[J].科学技术与工程,2014,22(2):41-44. 被引量：5
8吴伟,李艳雄,王梓里,陈祝允.基于语速差异的新闻发布会中首要说话人检测[J].计算机工程与应用,2015,51(4):222-225.
9田广利,程洁,马颖,胡明,刘磊.联合音视频中的跟踪技术研究[J].网络安全技术与应用,2015(4):80-80.
10田秀华,刘红光.基于类内类间距离的说话人特征优化[J].计算机应用与软件,2015,32(11):151-153.

同被引文献17

1陈联武,郭武,戴礼荣.声纹识别中合成语音的鲁棒性[J].模式识别与人工智能,2011,24(6):743-747. 被引量：3
2靳玉红.声纹识别中的语言属性映射[J].重庆邮电大学学报（自然科学版）,2012,24(4):507-511. 被引量：1
3陈祝允,李艳雄,杜佳媛.基于矢量量化的时序说话人聚类方法[J].科学技术与工程,2014,22(2):41-44. 被引量：5
4花城,李辉.小训练语料下基于均值超矢量聚类的说话人确认方法[J].数据采集与处理,2014,29(2):238-242. 被引量：4
5陈玥同,刘学亮.结合两种距离测度的说话人聚类算法[J].小型微型计算机系统,2015,36(10):2369-2373. 被引量：1
6王波,钟映春,陈俊彬.融合AP和GMM的说话人识别方法研究[J].广东工业大学学报,2015,32(4):145-149. 被引量：1
7王丰华,王邵菁,陈颂,袁国刚,张君.基于改进MFCC和VQ的变压器声纹识别模型[J].中国电机工程学报,2017,37(5):1535-1542. 被引量：84
8吴震东,潘树诚,章坚武.基于CNN的连续语音说话人声纹识别[J].电信科学,2017,33(3):59-66. 被引量：10
9郑凯鹏,周萍,张上鑫,柯晶晶.基于倒谱分量的融合参数应用于声纹识别[J].微电子学与计算机,2017,34(8):29-32. 被引量：5
10孙存威,文畅,谢凯,贺建飚.深度迁移模型下的小样本声纹识别方法[J].计算机工程与设计,2018,39(12):3816-3822. 被引量：5

引证文献2

1江楠,陈洁,肖潘,唐文强,林志泉.基于声纹识别的电力会议多角色语音的分离和识别研究[J].高电压技术,2023,49(S01):40-46. 被引量：1
2薛雷,张弛,张程浩,章依文.汉语儿童言语发育水平自动评估关键技术的研究[J].工业控制计算机,2019,32(7):74-75.

二级引证文献1

1陶雨昂.MFCC特征训练技术在声纹识别中的应用[J].集成电路应用,2024,41(2):386-387. 被引量：1

1李全栋,陈树越,张微.一种改进的无监督聚类的关键帧提取算法[J].应用光学,2010,31(5):741-744. 被引量：12
2李映,史勤峰,张艳宁,赵荣椿.SAR图像的自动分割方法研究[J].电子与信息学报,2006,28(5):932-935. 被引量：7
3万涛,邹维,万谦,何晓庆.结合无监督聚类的SVM二叉树在SAR自动目标识别系统中的应用[J].成都大学学报（自然科学版）,2008,27(2):134-136. 被引量：1
4王剑桥,李跃华,陈建飞.基于标签重构的弹载毫米波距离像识别算法[J].太赫兹科学与电子信息学报,2016,14(3):336-339.
5刘辉,杨俊安,王一,蔡学良.基于改进测地距离的等度规映射及其在声目标特征提取中的应用[J].兵工学报,2012,33(10):1178-1184. 被引量：2
6冯宝利.数字通信技术原理及其应用[J].民营科技,2014(6):19-19. 被引量：6
7王玙璠,艾浩军,涂卫平.Wi-Fi指纹聚类在室内感兴趣区域定位中的应用[J].计算机应用,2016,36(2):488-491. 被引量：9
8视频会议语音备份系统为视频会议保驾护航[J].中国多媒体通信,2009(7):72-72.
9崔晓佳,付强,王娜娜.基于四阶累积量和简化粒子群的盲分离算法[J].电子设计工程,2014,22(18):29-31. 被引量：1
10高亮,谷英亮,闫飞.视频会议与电话会议语音交互解决方案研究[J].电力信息与通信技术,2013,11(7):106-109. 被引量：1

华南理工大学学报（自然科学版）

2015年第1期

浏览历史

内容加载中请稍等...

一种多说话人角色聚类方法被引量：2

参考文献21

二级参考文献32

共引文献17

同被引文献17

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

一种多说话人角色聚类方法 被引量：2

参考文献21

二级参考文献32

共引文献17

同被引文献17

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

一种多说话人角色聚类方法被引量：2