摘要
在短时语音说话人快速转变的说话人转换检测中,用于训练说话人模型的连续语音较短导致模型不稳健,致使说话人转换检测的性能较差。为此,提出一种新的说话人转换检测方法。借鉴人耳听觉处理机制将语音信号分解为多个子带,可以得到准确的浊、清音边界,实现对零散清、浊音子段的拼接。利用贝叶斯信息准则判决语音子段间的疑似转换点,并运用音高特征做区间验证。实验结果表明,该方法在平均语音子段时长为1.34 s的极短语音条件下,可使说话人转换检测的等错率降至23.2%,F1值达到70%。
In Speaker Change Detection(SCD) of rapid conversion condition with short speech segment,speaker models training from deficient speech frames of a speaker are not rubust enough,and SCD performance is less satisfied.Therefore,a new SCD method based on Computational Auditory Scene Analysis(CASA) is proposed.The speech signal is decomposed into a number of narrow sub-band signals owing to the auditory processing mechamism of human ears.Accurate voiced speech and unvoiced speech boundaries are obtained,voice sub-segments is spliced from scattered voice and unvoiced sub-segments.Speaker change points are determined between the speaker voice sub-segments by Bayesian Information Criterion(BIC),pitch features extracted from voiced portion are used to verify region.Experimental results show that Equal Error Rate(EER) of SCD can be reduced to 23.2%,which corresponding to 70% of the F1-value,in the rapid conversion situation of average 1.34 s speech sub-segment.
出处
《计算机工程》
CAS
CSCD
北大核心
2018年第2期316-321,共6页
Computer Engineering
基金
国家自然科学基金"噪声和短语音条件下的说话人识别"(61370034)
关键词
说话人转换检测
计算听觉场景分析
伽马通能量倒谱系数
音高
贝叶斯信息准则
Speaker Change Detection(SCD)
Computational Auditory Scene Analysis(CASA)
Gammatone Energy Cepstral Coefficients(GECC)
pitch
Bayesian Information Criterion(BCI)