基于远近距离的说话人聚类算法

Speaker Clustering Based on Far and Near Distance

下载PDF

导出

摘要提出了基于远近距离的说话人聚类算法:首先,使用端点检测算法把语音分割成读音段,然后,采用T2公式对近距离的说话人语音段进行聚类得到语音块,最后,使用谱聚类的方法估计说话人数目,对远距离的说话人(语音块)进行聚类。实验结果表明,在近距离的说话人聚类中,使用T2公式比使用BIC和KL在语音块准确率方面分别高出2.62%和13.84%,在远距离的说话人聚类中,使用谱聚类算法基本上可以把语音中的说话人数目计算出来,当说话人数目为15时,类纯度和说话人纯度可以达到78%,说明该算法可以有效地对说话人进行聚类。 A method of speaker clustering based on far and near distance is proposed. Voice activity detector is used to segment speech into speech segments firstly, T2 is used to cluster the near distance speech segments which belongs to the same speaker, so speech chunk can be gotten, and spectral clustering method is used to estimate the number of speaker and cluster speech chunk. Experimental results shows that using T2 can improve 2.62% and 13.84% in speech chunk precise compared with BIC and KL in near distance clustering, respectively, using spec- tral clustering can calculate the number of speaker, clustering purity and speaker purity can reach 78% when the speaker number is fifteen in far distance clustering, which can mean this algorithm can cluster for the speakers ef- fectively.

作者陈雪芳杨继臣

机构地区东莞理工学院计算机学院仲恺农业工程学院计算机科学与工程学院

出处《科学技术与工程》北大核心 2013年第12期3297-3300,共4页 Science Technology and Engineering

基金国家自然科学基金(61101160)资助

关键词说话人聚类近距离聚类远距离聚类 speaker clustering near distance clustering far distance clustering

分类号 TN912.3 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献13

1Osbry B H, Ortal B H, Lapidot I, et al. Initialization of iterative- based speaker diarization systems for telephone conversations. 1EEE Transactions on Audio, speech, and language processing, 2012 ; 20 (2) ,414425.
2Jin H, Kubala F, Schwartz R. Automatic speaker elustering,in Proe. DARPA Speech Recognition Workshop, Chantilly, VA, Feb. 1997 ; 108111.
3Tranter S E, Reynolds D A. An overview of automatic diarization sys- tem. IEEE Transactions on Audio, speech, and language processing, 2006 ; 14 (5) : 15571565.
4Shen Han-Ping, Yeh Jui-Feng, Wu Chung-Hsien. Speaker clustering using decision tree-based phone duster models with muhi-spaee prob- ability distributions. IEEE Transactions on Audio, speech, and lan- guage processing,2011 ; 19 (5) , 12891300.
5Huijbregts M, van Leeuwen D A. Large-scale speaker diarization for long recordings and small collections. IEEE Transactions on Audio, speech, and language processing,2012 ;20 (2) ,404-413.
6Pardo J M, Barra-Chicote R, San-Segundo R,et aL Speaker diariza- tion feature: The UPM contribution to the RT09 evaluation. IEEE Transactions on Audio, speech, and language processing, 2012 ; 20 ( 2 ) ,426435.
7Duda R, Hart P, Stork D, Pattern classification ( Second Edition). John Wiley & Sons, Inc, 2001.
8Evans E, Bozonnet E, Wang D. A comparative study of Bottom-up and top-down approaches to speaker diarization. IEEE Transactions on Audio, speech, and language processing,2012 ;20 ( 2 ) : 382392.
9Zhou B, Hansen J H L. Efficient audio stream segmentation via the combined T2 statistic and Bayesian information criterion. IEEE Trans-actions on speech and audio processing,2005;13 (4) :467474.
10Ng A Y, Jordan M I, Weiss Y. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing System 14(Proc. of NIPS 2001 ) ,2001 ;849856.

1刘文革,李卫东,朱利.STAR2000发射信号序列剖析[J].民航科技,2005(1):20-23.
2吴跃前,杜明辉.基于奇异性的语音端点检测方法[J].计算机工程与设计,2008,29(10):2591-2594. 被引量：1
3杜彦.老彩电图像受干扰故障的分析[J].家电维修,2003(5):15-16.
4刘文革,李卫东,朱利.STAR2000发射信号序列剖析[J].中国民航飞行学院学报,2005,16(2):7-9.
5芬欧蓝泰标签推出标签新产品[J].中国印刷,2008(6):104-104.
6陈亮,张雄伟.基于分形维数实现语音分割和增强[J].北京邮电大学学报,2003,26(z1):112-114. 被引量：8
7芬欧蓝泰推出单品级标签和嵌入式标签新产品[J].丝网印刷,2008(5):53-53.
8董远,胡光锐,孙放.一种基于分形理论的语音分割新方法[J].上海交通大学学报,1998,32(4):97-99. 被引量：2
9梅晓丹,孙圣和.基于小波变换的静音与语音分割新算法[J].哈尔滨工业大学学报,2002,34(3):408-411. 被引量：12
10刘文锦.康佳LC26ES30型液晶电视无伴音故障检修[J].家电维修,2016,0(12):10-11.

科学技术与工程

2013年第12期

浏览历史

内容加载中请稍等...

基于远近距离的说话人聚类算法

参考文献13

相关作者

相关机构

相关主题

浏览历史