采用长度规整MAP的说话人分割聚类被引量：1

Speaker Diarization Based on Length Normalization MAP

下载PDF

导出

摘要本文首次提出了长度规整的最大后验估计(MAP)方法,并将其应用到说话人分割聚类中的交叉似然比(CLR)和T-Test这两种度量距离上。传统的MAP方法需要在通用背景模型(UBM)基础上进行统计量的计算,进而对模型参数进行自适应偏移,因此偏移的程度与语音片段的长度正相关。当在度量两个长度不相同的语音片段的相似性时,传统的MAP方法会使得说话人模型刻画不准确,从而影响距离度量。本文在MAP过程中,根据语音的长度对相关因子进行规整,然后再进行模型参数的调整,从而使得模型参数与语音长度无关,更能体现说话人的身份信息。在中文多人电视访谈节目数据的分割聚类评测任务上,采用长度规整的MAP方法相对于传统方法都有明显提升,在CLR度量准则下分割聚类错误率相对下降了3.5!,在T-Test度量准则下分割聚类错误率相对下降了10.7!。 We proposed a length normalization maximum a posterior（ MAP） algorithm,which can be applied to Cross Likelihood Ratio（ CLR） and T-Test distance metric methods in speaker diarization. Since the shift from the UBM in adaptation procedure is based on statistics calculated against the Universal Background Model（ UBM）,the model parameters obtained from the classical MAP method have a positive correlation with the length of the speech segment. When measuring the similarity of two segments with different length,the classical MAP method will bring about speaker model＇s variability,which would affect the distance metric in speaker diarization. We proposed to apply length normalization to the relevant factor before adapting the parameters of the speaker model. Hence,the model parameters are irrelevant to the length of the speech,and it can reflect the speaker＇s identity better. In the speaker diarization task of a Chinese multi-speaker TV talk show,Compared with the classical MAP,the proposed normalized MAP method can reduce the diarization error rate by3. 5! #$ %＇（）＊＋,-.%＇/#$0 1＇%23 4$3 56 78. 9! in the T-Test clustering method.

作者朱唯鑫郭武

机构地区中国科学技术大学语音及语言信息处理国家工程实验室

出处《信号处理》 CSCD 北大核心 2016年第7期859-865,共7页 Journal of Signal Processing

基金安徽省自然科学基金资助项目(1408085MKL78)资助

关键词说话人分割聚类最大后验估计长度规整交叉似然比 T检验距离 speaker diarization maximum a posterior length normalization cross likelihood ratio T-Test

分类号 TN912.34 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献14

1Miro X A, Bozonnet S, Evans N, et al. Speaker diarization: A review of recent research[ J]. IEEE Transactions on Au- dio, Speech, and Language Processing, 2012, 20 ( 2 ) : 356-370.
2Tranter S E,Reynolds D. An overview of automatic speaker diarization systems [ J ]. IEEE Transactions on Audio, Speech,and language Processing,2006,14(5):1557-1565.
3Desplanques B, Demuynek K, Martens J P. Factor Anal- ysis for Speaker Segmentation and Improved Speaker Dia- rization [ C ] //Proceedings of the 16th Annual Conference of the International Speech Communication Association ( INTERSPEECH). Dresden, Germany ,2015:3081-3085.
4Cheng S S,Wang H bl,Fu H C. BIC-based speaker segmen- tation using divide-and-conquer strategies with application to speaker diarization [ J ]. IEEE Transactions on Audio, Speech,and language Processing,2010,18(1) :141-157.
5Delgado H, Anguera X, Fredouille C, et al. Fast single-and cross-show speaker diarization using binary key speaker modeling[ J ]. IEEE/ACM Transactions on Audio, Speech, and Language Processing,2015,23(12) :2286-2297.
6Nguyen T H, Chng E S, Li H. T-test distance and clus- tering criterion for speaker diarization [ C ] // Proceedings of the 9th Annual Conference of the International Speech Communication Association(INTERSPEECH). Brisbane, Australia, 2008:36-39.
7Dehak N, Kenny P, Dehak R, et al. Front-end factor anal- ysis for speaker verification [ J ]. IEEE Transactions on Audio, Speech, and Language Processing, 2011,19 ( 4 ) : 788-798.
8Madikeri S, Himawan I, Motlicek P, et al. Integrating Online I-vector extractor with Information Bottleneck based Speaker Diarization system [ C ] //Proceedings of the 16th Annual Conference of the International Speech Communication Association (INTERSPEECH). Dresden, Germany,2015:3105-3109.
9Reynolds D A, Quatieri T F, Dunn R B. Speaker verifi- cation using adapted Ganssian mixture models[ J]. Digit- al signal processing,2000, 10(1): 19-41.
10Zhu Q, Soraghan J J. LBP based recursive averaging for babble noise reduction applied to automatic speech recog- nition[ C]/JProceedings of the 22th European IEEE Sig- nal Processing Conference (EUSIPCO). Lisbon, Portu- gal,2014 : 1267-1271.

同被引文献4

1马勇,鲍长春.说话人分割聚类研究进展[J].信号处理,2013,29(9):1190-1199. 被引量：7
2李稀敏,洪青阳,黄晓丹.基于说话人的音频分割与聚类[J].心智与计算,2010,0(2):139-147. 被引量：5
3张欢,陆见光,唐向红.面向冲突证据的改进DS证据理论算法[J].北京航空航天大学学报,2020,46(3):616-623. 被引量：31
4Guangzhe Zhao,Aiguo Chen,Guangxi Lu,Wei Liu.Data Fusion Algorithm Based on Fuzzy Sets and D-S Theory of Evidence[J].Tsinghua Science and Technology,2020,25(1):12-19. 被引量：19

引证文献1

1项羽,令晓明,郭亚龙.基于DS证据理论多特征融合模型的说话人分割聚类研究[J].科技创新与应用,2023,13(23):108-111.

1龙绪明.SMT检验测试技术AOI[J].电子与自动化仪表信息,1992(4):15-19. 被引量：1
2马勇,鲍长春.说话人分割聚类研究进展[J].信号处理,2013,29(9):1190-1199. 被引量：7
3张薇,刘加.电话语音的多说话人分割聚类研究[J].清华大学学报（自然科学版）,2008,48(4):574-577. 被引量：6
4汪洋,甘涛,向军.广播电视新闻中的主持人跟踪系统[J].计算机系统应用,2014,23(10):40-45.
5Cree大功率LEDs创新纪录可达129lm/W[J].光机电信息,2007,24(10):70-71.
6苗旭炳,简涛,丁彪.非均匀杂波协方差矩阵的知识辅助估计方法[J].电光与控制,2016,23(10):45-48.
7王忠,任苏萍.GPS干涉法测向研究[J].中国空间科学技术,2000,20(1):44-47.
8李彦,张德祥,桂树国.基于Directionlet变换的遥感图像降噪算法研究[J].佳木斯大学学报（自然科学版）,2016,34(4):560-563. 被引量：2
9赵文红,王巍.求解单通道盲分离问题的一种新方法[J].计算机科学,2013,40(06A):61-63. 被引量：5
10苏静,舒勤,王灵伟.基于最大后验估计的恒模盲均衡改进算法[J].计算机工程与设计,2010,31(21):4572-4575. 被引量：7

信号处理

2016年第7期

浏览历史

内容加载中请稍等...

采用长度规整MAP的说话人分割聚类被引量：1

参考文献14

同被引文献4

引证文献1

相关作者

相关机构

相关主题

浏览历史

采用长度规整MAP的说话人分割聚类 被引量：1

参考文献14

同被引文献4

引证文献1

相关作者

相关机构

相关主题

浏览历史

采用长度规整MAP的说话人分割聚类被引量：1