期刊文献+

一种多说话人角色聚类方法 被引量:2

A Clustering Method for Multiple Speaker Roles
下载PDF
导出
摘要 为有效找出会议语音中的说话人角色个数及各角色的说话人语音,提出了一种多说话人角色聚类方法.首先定义说话人角色聚类的特征,然后采用测地距离度量特征的相似度,进而提出了一种利用类内距离来控制类间合并的多说话人角色聚类方法,最后采用4种不同类型的会议语音对该方法进行测试.结果表明:对手工分割和自动分割后的会议语音进行说话人角色聚类时,如果采用相同的聚类方法,则使用测地距离的性能优于使用传统距离的性能;如果采用相同的距离度量方法,则文中方法的性能优于传统层次聚类方法. In order to find the number of speaker roles and the corresponding speakers' speech in meeting speeches, a clustering method for multiple speaker roles is proposed. Firstly, features for speaker role clustering are defined. Secondly, geodesic distance is used to measure the similarities among features. Then, inner-class distance is used to control inter-class mergence to form the clustering method. Finally, four different types of meeting speech corpora are used to validate the effectiveness of the proposed method. The results indicate that, for the meeting speeches obtained by both manual and automatic segmentation, the clustering performance using geodesic distance is superior to that using traditional distance when the same clustering algorithm is used in all cases, and that the proposed method performs better than the traditional hierarchical clustering method when the same measuring distance is used.
出处 《华南理工大学学报(自然科学版)》 EI CAS CSCD 北大核心 2015年第1期21-27,33,共8页 Journal of South China University of Technology(Natural Science Edition)
基金 国家自然科学基金资助项目(61101160) 广州市珠江科技新星专项(2013J2200070) 华南理工大学中央高校基本科研业务费专项资金重点项目(2013ZZ0053)~~
关键词 说话人角色 特征距离度量 角色聚类 测地距离 无监督聚类 speaker role characteristic distance measure role clustering geodesic distance unsupervised clustering
  • 相关文献

参考文献21

  • 1Salamin H,Vinciarelli A.Automatic role recognition in multiparty conversations:an approach based on turn or- ganization,prosody,and conditional random fields [J]. IEEE Transactions on Multimedia, 2012,14 (2) : 338-345.
  • 2Bigot B, Ferrane I, PinquierJ, Andre-Obrecht R.Detecting individual role using features extracted from speaker diarization results [ J ].Multimedia Tools and Applications Archive, 2012,60 (2) : 347-369.
  • 3Barzilay R,Collins M,Hirschberg J, et al.The rules behind roles:identifying speaker role in radio broadcasts [C ]// Proceedings of the Seventeenth National Conference on Artificial Intelligence.Austin Texas: AAAFIAAI, 2000: 679-684.
  • 4Liu Y.Initial study on automatic identification of speaker role in broadcast news speech [C]//Proceedings of Hu- man Language Technology Conference of the North Ame- rican Chapter of the Association of Computational Lin- guistics.Stroudsburg:Association for Computational Lin- guistics, 2006 : 81-84.
  • 5Deng L.A tutorial survey of architectures, algorithms, and applications for deep learning [J ].APSIPA Transactions on Signal and Information Processing, 2014,3 (e2) : 1-29.
  • 6Hinton G E, Salakhutdinov R R.Reducing the dimension- ality of data with neural networks [ J ].Science, 2006,313 (5786) :504-507.
  • 7Tenenbaum J B, de Silva V, Langford J C.A global geome- tric framework for nonlinear dimensionality reduction [J]. Science,2000,290(5500):2319-2323.
  • 8Roweis S T, Saul L K.Nonlinear dimensionality reduction by locally linear embedding [J ].Science,2000,290 (5500) :2323-2326.
  • 9Kothari R,Jain V.Learning from labeled and unlabeled data [C]//Proceedings of the 2002 International Joint Conference on Neural Networks.Piscataway : IEEE, 2002 : 2803-2808.
  • 10Caruana R.Multitask learning:a knowledge-based source of inductive bias [C]//Proceedings of the Tenth Inter- national Conference on Machine Learning.San Francisco: Morgan Kaufmann, 1993 : 41-48.

二级参考文献32

  • 1Margarita Kotti,Luis Gustaro. Automatic speaker segmentation using muhiple feature and distance measure:a comparison of three approaches [ C ]//Proceedings of IEEE International Conference on Multimedia and Expo. Toronto : IEEE ,2006 : 1 101-1 104.
  • 2Amit S Malegaonkar, Aladdin M Ariyaeeinia, Perasiriyan Sivakumaran. Efficient speaker change detection using adapted Gaussian mixture models [ J ]. IEEE Transactions on Audio, Speech and Language Processing, 2007, 15 (6) :1 859-1 869.
  • 3Soonil kwon, Shrikanth Narayanan. Unsupervised speaker indexing using generic models[J].IEEE Transactions on Speech and Audio Processing ,2005,13 ( 5 ) : 1004-1013.
  • 4Chen S S, Gopalakrishnan P S. Speaker, environment and channel change detection and clustering via the Bayesian information criterion [ C ] // Proceedings of DARPA Broadcast News Transeription and Understanding Work- shop. Landowne : [ s. n. ] , 1998 : 127-132.
  • 5Lu Lie, Zhang Hong-jiang. Speaker change detection and tracking in real-time news broadcasting analysis [ C ] // Proceedings of the Tenth ACM International Conference on Multimedia Multimedia. Juan-les-Pins: ACM, 2002: 602-610.
  • 6Delacourt P, Wellekens C J. DISTBIC: a speaker-based segment for audio data indexing [ J ]. Journal of Speech Communication, 2000,32 : 111 -126.
  • 7Du Yun-feng, Hu Wei, Yah Yong-hong, et al. Audio segmentation via tri-model Bayesian information criterion [ C ] //Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Honolulu: IEEE, 2007 : 205- 208.
  • 8Bowen Zhou, John H L Hansen. Efficient audio stream segmentation via the combined T2-statistics and Bayesian information criterion [ J ]. IEEE Transactions on Speech and Audio Processing,2005,13 (4) :467-474.
  • 9Cheng Shih-sian, Wang Hsin-min, Fu Hsin-chia. BICbased audio segmentation by divide-and-conquer [ C ] // Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Las Vegas: IEEE, 2008:4841-4844.
  • 10Tritschler A, Gopinath R. Improved speaker segmentation and segments clustering using the Bayesian Information Criterion [ C ] // Proceedings of Eurospeech. Budapest : [s. n. ] ,1999:679-682.

共引文献17

同被引文献17

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部