期刊文献+

基于远近距离的说话人聚类算法

Speaker Clustering Based on Far and Near Distance
下载PDF
导出
摘要 提出了基于远近距离的说话人聚类算法:首先,使用端点检测算法把语音分割成读音段,然后,采用T2公式对近距离的说话人语音段进行聚类得到语音块,最后,使用谱聚类的方法估计说话人数目,对远距离的说话人(语音块)进行聚类。实验结果表明,在近距离的说话人聚类中,使用T2公式比使用BIC和KL在语音块准确率方面分别高出2.62%和13.84%,在远距离的说话人聚类中,使用谱聚类算法基本上可以把语音中的说话人数目计算出来,当说话人数目为15时,类纯度和说话人纯度可以达到78%,说明该算法可以有效地对说话人进行聚类。 A method of speaker clustering based on far and near distance is proposed. Voice activity detector is used to segment speech into speech segments firstly, T2 is used to cluster the near distance speech segments which belongs to the same speaker, so speech chunk can be gotten, and spectral clustering method is used to estimate the number of speaker and cluster speech chunk. Experimental results shows that using T2 can improve 2.62% and 13.84% in speech chunk precise compared with BIC and KL in near distance clustering, respectively, using spec- tral clustering can calculate the number of speaker, clustering purity and speaker purity can reach 78% when the speaker number is fifteen in far distance clustering, which can mean this algorithm can cluster for the speakers ef- fectively.
出处 《科学技术与工程》 北大核心 2013年第12期3297-3300,共4页 Science Technology and Engineering
基金 国家自然科学基金(61101160)资助
关键词 说话人聚类 近距离聚类 远距离聚类 speaker clustering near distance clustering far distance clustering
  • 相关文献

参考文献13

  • 1Osbry B H, Ortal B H, Lapidot I, et al. Initialization of iterative- based speaker diarization systems for telephone conversations. 1EEE Transactions on Audio, speech, and language processing, 2012 ; 20 (2) ,414425.
  • 2Jin H, Kubala F, Schwartz R. Automatic speaker elustering,in Proe. DARPA Speech Recognition Workshop, Chantilly, VA, Feb. 1997 ; 108111.
  • 3Tranter S E, Reynolds D A. An overview of automatic diarization sys- tem. IEEE Transactions on Audio, speech, and language processing, 2006 ; 14 (5) : 15571565.
  • 4Shen Han-Ping, Yeh Jui-Feng, Wu Chung-Hsien. Speaker clustering using decision tree-based phone duster models with muhi-spaee prob- ability distributions. IEEE Transactions on Audio, speech, and lan- guage processing,2011 ; 19 (5) , 12891300.
  • 5Huijbregts M, van Leeuwen D A. Large-scale speaker diarization for long recordings and small collections. IEEE Transactions on Audio, speech, and language processing,2012 ;20 (2) ,404-413.
  • 6Pardo J M, Barra-Chicote R, San-Segundo R,et aL Speaker diariza- tion feature: The UPM contribution to the RT09 evaluation. IEEE Transactions on Audio, speech, and language processing, 2012 ; 20 ( 2 ) ,426435.
  • 7Duda R, Hart P, Stork D, Pattern classification ( Second Edition). John Wiley & Sons, Inc, 2001.
  • 8Evans E, Bozonnet E, Wang D. A comparative study of Bottom-up and top-down approaches to speaker diarization. IEEE Transactions on Audio, speech, and language processing,2012 ;20 ( 2 ) : 382392.
  • 9Zhou B, Hansen J H L. Efficient audio stream segmentation via the combined T2 statistic and Bayesian information criterion. IEEE Trans-actions on speech and audio processing,2005;13 (4) :467474.
  • 10Ng A Y, Jordan M I, Weiss Y. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing System 14(Proc. of NIPS 2001 ) ,2001 ;849856.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部