期刊文献+

基于计算听觉场景分析的说话人转换检测 被引量:1

Speaker Change Detection Based on Computational Auditory Scene Analysis
下载PDF
导出
摘要 在短时语音说话人快速转变的说话人转换检测中,用于训练说话人模型的连续语音较短导致模型不稳健,致使说话人转换检测的性能较差。为此,提出一种新的说话人转换检测方法。借鉴人耳听觉处理机制将语音信号分解为多个子带,可以得到准确的浊、清音边界,实现对零散清、浊音子段的拼接。利用贝叶斯信息准则判决语音子段间的疑似转换点,并运用音高特征做区间验证。实验结果表明,该方法在平均语音子段时长为1.34 s的极短语音条件下,可使说话人转换检测的等错率降至23.2%,F1值达到70%。 In Speaker Change Detection(SCD) of rapid conversion condition with short speech segment,speaker models training from deficient speech frames of a speaker are not rubust enough,and SCD performance is less satisfied.Therefore,a new SCD method based on Computational Auditory Scene Analysis(CASA) is proposed.The speech signal is decomposed into a number of narrow sub-band signals owing to the auditory processing mechamism of human ears.Accurate voiced speech and unvoiced speech boundaries are obtained,voice sub-segments is spliced from scattered voice and unvoiced sub-segments.Speaker change points are determined between the speaker voice sub-segments by Bayesian Information Criterion(BIC),pitch features extracted from voiced portion are used to verify region.Experimental results show that Equal Error Rate(EER) of SCD can be reduced to 23.2%,which corresponding to 70% of the F1-value,in the rapid conversion situation of average 1.34 s speech sub-segment.
出处 《计算机工程》 CAS CSCD 北大核心 2018年第2期316-321,共6页 Computer Engineering
基金 国家自然科学基金"噪声和短语音条件下的说话人识别"(61370034)
关键词 说话人转换检测 计算听觉场景分析 伽马通能量倒谱系数 音高 贝叶斯信息准则 Speaker Change Detection(SCD) Computational Auditory Scene Analysis(CASA) Gammatone Energy Cepstral Coefficients(GECC) pitch Bayesian Information Criterion(BCI)
  • 相关文献

参考文献4

二级参考文献47

  • 1黄海亮,谢康林,杜平,吴边.一种高精度的基音提取方案[J].计算机工程,2004,30(B12):343-345. 被引量:4
  • 2张一彬,周杰,边肇祺,张大鹏.一种基于内容的音频流二级分割方法[J].计算机学报,2006,29(3):457-465. 被引量:7
  • 3张世磊,张树武,徐波.一种两层次无监督的音频分割算法[J].中文信息学报,2007,21(2):106-111. 被引量:5
  • 4蒋李兵,王壮,胡卫东.一种基于可变夹角链码的靠岸舰船目标检测方法[J].遥感技术与应用,2007,22(1):88-94. 被引量:12
  • 5黄海辉,邹虹.一种高效的语调检测与评估方案[J].重庆邮电大学学报(自然科学版),2007,19(B06):72-74. 被引量:1
  • 6Chen S,Gopalakrishnan R.Speaker Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion[C] //Proc.of DARPA Broadcast News Transcription and Understanding Workshop.Lansdowne,VA,USA:[s.n.] ,1998:27-132.
  • 7Sivakumaran P,Fortuna J,Ariyaeeinia A M.On the Use of the Bayesian Information Criterion in Multiple Speaker Detection[C] //Proc.of EUROSPEECH'01.Aalborg,Denmark:[s.n.] ,2001.
  • 8Cheng Shih-Sian,Wang Hsin-Min,Fu Hsin-Chia.BIC-based Audio Segmentation by Divide-and-conquer[C] //Proc.of ICASSP'08.Las Vegas,USA:IEEE Press,2008:4841-4844.
  • 9Rabiner L R, Cheng M J, and Rosenberg A E, et al.. A comparative performance study of several pitch detection algorithms. IEEE Trans. on Acoustics, Speech, and Signal Processing, 1976, ASSP-24(5): 399-418.
  • 10Mallat S and Zhong S. Characterization of signals from multiscale edges. IEEE Trans. on PAMI, 1992, 14(7): 710-732.

共引文献28

同被引文献9

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部